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1 Preface 


These lecture notes are written when the course in integration theory 1s for the first 
time in more than twenty years, given jointly by the the two divisions Mathematics 
and Mathematical Statistics. The major source is G. B. Folland: Real Analysis, 
Modern Techniques and Their Applications. However, the parts on probability 
theory are mostly taken from D. Williams: Probability with Martingales. Another 
source is Christer Borell’s lecture notes from previous versions of this course, see 


www.math.chalmers.se/Math/Grundutb/GU/MMA110/A11/ 


2 Introduction 


This course introduces the concepts of measures, measurable functions and Lebesgue 
integrals. The integral used in earlier math courses is the so called Riemann in- 
tegral. The Lebesgue integral will turn out to be more powerful in the sense that 
it allows us to define integrals of not only Riemann integrable functions, but also 
some functions for which the Riemann integral is not defined. Most importantly 
however, is that it will allow us to rigorously prove many results for which proofs 
of the corresponding results in the Riemann setting are usually never seen by stu- 
dents at the basic and intermediate level. Such results include precise conditions 
for when we can change order of integrals and limits, change order of integration 
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in multiple integrals and when we can use integration by parts. Of course, we will 
also prove many new results. 

The concept of measurability is an advanced one, in the sense that a lot of 
people at first find it difficult to master; it tends to feel fundamentally more abstract 
than things one has encountered before. Therefore, a natural first question is why 
the concept is needed. To answer this, consider the following example. 

Let X = R/Z, the circle of circumference 1, with addition and multiplication 
defined modulo 1. Suppose we want to introduce the concept of the length of 
subsets of X. A natural first assumption is that one should be able to do this so 
that the length is defined for all subsets of X. It is also extremely natural to claim 
that the length /, should satisfy 


e [(0) =0, 

e 1(X) =1, 

e I(US)An = SOP U(A,) for all disjoint A;, Ag,..., 
e [(A+2) =I(A) forall AC X andz Ec X. 


However, if we insist on defining / for all subsets, this turns out to be impossible. 
Let us see why. 

Partition X into equivalence classes by saying that x and y are equivalent if 
x — y is arational number. By the axiom of choice, there exists a set A containing 
exactly one element from each equivalence class. For each g € QMX, let A, = 
A+q. Then U, Aq = X, for since for each x € X, A contains an element y 
equivalent to v,ie.v € A,_,andx—yeEQ. 

On the other hand, the A,’s are disjoint, for if x € A,,NA,,, thenz = y+q, = 
z + qo for two elements y, z € A. However, then y — z = qo —q, € Q, so y and z 
are equivalent, contradicting the construction of A. 

If we could assign lengths to the A,’s, then these lengths must be equal by the 
fourth condition on /. On the other hand, the lengths of the A,’s must sum to 1 by 
the third condition. However, these two conditions are mutually exclusive. 

The moral of the example is that the set A must be declared non-measurable; 
no length of A can be defined. The construction of the example is based on the 
axiom of choice and it can be shown that all constructions of non-measurable sets 
must rely on the axiom of choice. 

There are even more absurd examples than this one. The famous Banach- 
Tarski paradox proves, using the axiom of choice, that for any two bounded com- 
pact sets in R®, the one can be divided into a finite number of parts which can be 
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translated and rotated and mirrored and then put back together to form the other. 
For example: any grain of sand can be divided into a number of pieces that can be 
put back together to form a ball the size of the earth! Clearly theses pieces cannot 
have a well defined volume. 

Examples like these call for a theory of measures and measurable sets. 


3 Measures 


We are going to consider measures in a very general framework: we will consider 
measures on a an abstract space X on we which we make no initial assumptions 
whatsoever. As the above example revealed, it is not always possible with mean- 
ingful measures defined on all subsets of X. Hence a concept of what classes 
of subsets to define a desired measure on, is needed. The two last conditions on 
a length measure in the above example were natural in that particular situation, 
but it is easy to think of other situations where neither of them is natural or even 
meaningful. The two first conditions however, are such that they should hold for 
anything that deserves to be called a measure, no matter what structure X has. 
Thus we keep those two conditions in mind, and ask for classes of subsets large 
enough to ensure that all interesting set operations on measurable sets results in a 
measurable set, but restrictive enough to make sure that no conflict with the basic 
assumptions arises. The answer 1s o-algebras. 


3.1 Algebras and c-algebras 
Definition 3.1 Let A be a class of subsets of X such that 


(i) XEA 
(ii) E° € Awhenever E € A, 
(iii) EUF € Awhenever E,F € A. 


Then A is called an algebra (on X ). 


Note that by (i) and (ii), 0 = X° € A. Also, if E,F € A, then EOF = 
(E°U F°)° € A by (ii) and (iii). 


Definition 3.2 Let M be a class of subsets of X such that 


(i) X EM, 
(ii) E° © M whenever E € M, 


(iii) UP, En € M whenever Ey, Ey,... € M. 
Then M is called a o-algebra. 


Clearly any o-algebra is an algebra. As above @ € M, and analogously, if 
Fy, Ey,... € M, then (),, Fn = (U,, BS)° € M. 

A measure will always be defined on a o-algebra. The smallest possible o- 
algebra on any space X is {@, X}. The largest c-algebra is P(X), the class of 
all subsets of X (but we have seen that meaningful measures cannot always be 
defined on this o-algebra). 

If M is a o-algebra on X, then the pair (X,M) is called a measurable space 
anda set EF € M is called M-measurable. 


3.2 Generated c-algebras 


Let C be an arbitrary class of subsets of X. We define the o-algebra generated by 
C as the smallest o-algebra containing C, 1.e. 

ae) = Gea : F o-algebra, F DC}. 
(It is an easy exercise to show that any intersection of o-algebras is a c-algebra.) 

The most important example is the Borel c-algebra; if X is a topological space 
and 7 is the class of open sets, then the Borel c-algebra, B(X), is given by 

BX)=alF). 
Since any open set in R is a countable union of open intervals, it follows that 
B(R) = o((a,b): a,b € R). 
It is now easy to see (check this!) that we also have 
B(R) = o({a,b): a,b € R) =o((a,b]: a,b € R) = o({a,b] : a,b € R) 
o((—oo, b): bE R) =a((a,o): aE R). 

In integration theory, one often works with the extended real line, R = [—cx, oo] 
and, even more, with the extended positive half-line R; = [0, co]. Here the arith- 
metics involving the points oo and —oo work as one would intuitively guess, and 
a subset is regarded as open if it is either a subset of R and open as such, of the 


form [—oo, a) or (a, oo}, or the whole space. It is now straightforward to prove 
analogous expressions for B(R) and B(R,). 
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3.3. Measures 


If C is aclass of subsets of X and jug : C + R,, then ug is called a set function. Let 
A be an algebra. If ju is a set function on A such that po(@) = 0 and FE, F € A, 
EQF = Q implies po(L UF) = po(F) + po(F), then pio is said to be additive. If 
o(0) = 0 and ju satisfies the stronger condition that juo(U,, En) = >>, Ho( En) 
whenever Fi, Ey,...A and U,, E,, € A, then ji is said to be countably additive 
or a premeasure. (Stronger since additivity follows from countable additivity by 
taking EF, = E, E, = F and E3 = E,y=...= 9.) 


Definition 3.3 Let M be a o-algebra and 1 a set function defined on M. If ju is 
countably additive, then 1 is said to be a measure. 


Let ~ be a measure on the o-algebra M. Here are a few classifications. 
e is said to be finite if p(X) < oo. 
e ,. can be said to be a probability measure if u(X) = 1. 


e jis said to be o-finite if there exist sets E,, E2,... € M such that U, EF, = 
X and (E,,) < oo for all n. 


e y is said to be semi-finite if for every E € M such that (E) = oo, there 
exists a set F’ C E such that 0 < p(F’) < on. 


The trivial measure is the measure 4 with u(F) = 0 for all EF € M. Clearly 
any probability measure is finite, any finite measure is o-finite and every o-finite 
measure is semi-finite. 


Example. Let (0) = 0 and (EZ) = oo for any nonempty measurable E’. Then 
jt is a measure which is not even semi-finite. O 


Example. Length measure on [0, 1] (which, to be true, we have not defined yet) is 
a probability measure. Length measure on R is o-finite; take e.g. FE, = (—n,n). 


When M is a o-algebra on X and ju is a measure on M, the triple (X,M, 1) 
is called a measure space. If 4(X) = 1, then we may also speak of (X,M, 1) as 
a probability space and if we do that, we usually refer to /-measurable sets as 
events. 


Remark. Suppose that p(X) = 1. Then we can choose to call jz a probability 
measure and (X,M, ju) a probability space. Whether or not we actually do that 
depends on the point of view we want to adopt. In many situations it is either our 
main purpose to model a random experiment or it is instructive or useful for some 
other reason to think of the points x € X as the possible outcomes of a random 
experiment. If this is not the case, we may instead prefer to just refer to jz as a 
finite measure of total mass 1. 


Some general properties of measures follow. In all of these, it is assumed that 
(X,M, ,1) is a measure space. 


Proposition 3.4 (a) E,.FEeM, ECF => uF) < pF). 
(b) Ey, Eo,...€ M => w(U, En) < ¥, w(En), 
(c) If u(X) < co, then (EU F) = pl) + wl) -— WEN F), 
(d) If u(X) < co, E,F € Mand E © F, then p(F \ E) = w(F) — (EB). 


Proof. By additivity of uw, u(F’) = 
This proves (d) and since pu(F' \ F) > 0, 
and recursively F,, = E,, \ (Be Li 
U, Fn = U,, En. so by (a) 


me Be) = S/F) =s So u(En)- 
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(EE) + w(F \ £) whenever EF C F. 
(a) follows too. For (b), let Ff, = EF, 
2,3,.... Then the F’,’s are disjoint and 


Finally (c) follows from 


WMEUF) = p(B) + WP \ EOF) = wl) + w(P) - WEN F) 


by additivity and (d). Oo 


Proposition 3.5 (Continuity of measures) 
(a) IfE, CE, C...andE =, Ey then p(E) = lim, p(£,,). 


(b) IfFi DF, D>... F=), Fn and w(fi) < ©, then p(F) = lim, p(F,). 


Proof. For (a), let Ay = F, and recursively A,, = E,, \ E,_1. Then E = 
L),, An and the A,,’s are disjoint, so 


w(E) =) 0 w(Aj) = lim } 7 u(y) = lim u(En) 


since E,, = LJ A;. Now (b) follows from applying (a) to E, = F, \ F, and 
E = F; \ F and using Proposition 3.4(d). oO 


Corollary 3.6 [f 1(N,,) = 0 for all n, then u(U,, Nn) = 0. 


Proof. Apply e.g. Proposition 3.4(b). Oo 


3.4 ”Almost everywhere” and completeness 


Let S be a proposition about points of X and suppose that F' = {x : S(x) is false} 
is measurable. If (F’) = 0, then S is said to hold almost everywhere (with respect 
to 41 if other measures are also under discussion), abbreviated a.e. In case p is a 
probability measure, one often instead says that S holds almost surely, abbreviated 
a.s. 

If S holds a.e. and T is another proposition such that 7'(2) is true whenever S' 
is true, then one would clearly want to think of T’ as also holding a.e. However 
this is not so in general, since even if ju(/) = 0, it may be the case that some 
subset £ of F’ is not measurable. If (X,M, 41) is such that EF € M whenever 
ECF,F € Mand (Ff) = 0, then the measure space is said to be complete and 
jt 1s said to be a complete measure. 

If 4 is not complete, then one can always extend the measure space, by defin- 
ing the larger o-algebra 


M={EUF:EE€M,INEM:FCN,p(N) =0} 


(exercise: prove that M is a c-algebra) and the measure 7i on M by 7i(E'U F) = 
p(). Then (X, M, 72) is complete and 77 is called the completion of ju. 


3.5. Dynkin’s Lemma and the Uniqueness Theorem 


Dynkin’s Lemma will be a fundamental tool for theorem proving. It is based on 
the concepts of 7-systems and d-systems. A m-system is a class Z of subsets of X 
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that is closed under finite intersections, 1.e. F&M Ff’ € Z whenever EF, F’ € Z. The 
definition of a d-system follows. 


Definition 3.7 Let D be a class of subsets of X. Then D is said to be d-system if 
(a) X ED, 
(b) E, FED, ECF>F\EED, 
(c) EL, €D, Eat E> EED. 
Generated d-systems are defined analogously with generated o-algebras: 
C= (\{P 2C:Dd-system}S. 
(Check that any intersection of d-systems is a d-system.) 


Theorem 3.8 Let M be a class of subsets of X. Then M is a o-algebra if and 
only if it is t-system and a d-system. 


Proof. The only if-direction is obvious. The if direction follows from that 
X € M by (a) in the definition of a d-system, E© = X \ E € M whenever 
Ee Mby (b)andif £, € M,n =1,2,..., then F, := Ul E; = (At £5)° EM 
since M is a 7-system, so FE := UJ? E; € M by (c) since F;, t E. Oo 


Since any o-algebra is also a d-system, it follows that 0(C) D d(C) for any C. 
Dynkin’s Lemma provides an answer to when we have equality. 


Theorem 3.9 (Dynkin’s Lemma) 
If I is a 7-system, then d(Z) = o(Z). 


Proof. It suffices to prove that d(Z) D o(Z). By Theorem 3.8 it thus suffices 
to prove that d(Z) is a 7-system. In other words, it suffices to prove that 


D2 :={B € d(Z): BNC € d(Z) forall C € d(Z)} 
equals d(Z). The proof is done in two similar steps. For step 1, define 
D, := {B € d(Z): BNC € d(Z) forall C € ZT}. 


Since Z is a 7-system, D, contains ZT, so if we can show that D, is a d-system, then 
D, = d(Z). Part (a) in the definition of a d-system obviously holds. If B,, By € 
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D, and B, C Bo, then for any C' € ZT, (By\ By)NC = (ByNC)\(BiNC) € d(Z) 
since d(Z) is a d-system. Hence part (b) holds for D,. Finally if B, € D, and 
B, + B, then B, AC + BNC, so B € D, since d(Z) is a d-system. 

That D, = d(Z) means that D2 D T, so it suffices now to prove that Dz is a 
d-system, which is now done in complete analogy with step 1. (Check that you 
can fill this in.) O 


Our first application is the following uniqueness theorem for measures. 


Theorem 3.10 (Uniqueness of finite measures) 

Suppose that T is a t-system and M = o(Z). If 4, and [2 are two measures 
on M such that p(X) = p2(X) < oo and py(I) = po(L) for all I € T, then 
M1 = [2. 


Proof. | By Dynkin’s Lemma, it suffices to prove that D := {EF € M : 
ju(E) = po(E)} is a d-system. That X € D follows from the first part of the 
assumption. If £,F € Dand FE C F, then wi(F \ E) = (FP) — ni (E) = 
flo( PF) — po(F) = po(F'\ E),so F\ E € D. Finally if E,, € D and E,, + E, then 
[i (E,) = fe(E,), 80 41(E) = t2(E) by the continuity of measures. O 


Corollary 3.11 Jf two probability measures agree on T, then they are equal. 


3.6 Borel-Cantelli’s First Lemma 


Definition 3.12 Let E,, F,... be subsets of X. Then 


lim sup E,, := A) U E, 


m=1n=m 


lim inf E, — v A) E,. 


m=1n=m 


Note that 


limsup F,, = {x € X : a € E,, for infinitely many n} 


and 
liminf EF, = {a € X : x € E,, for all but finitely many 7}. 


9 


One sometimes writes E,,7.0. for limsup,, F,,, where 7.0. stands for infinitely 
often”. (There is no corresponding abbreviation for lim inf, Ey.) 

Let (X,M, 1) be a measure space and suppose that Fy), Fy,... € M. Since 
a o-algebra is closed under countable intersections and unions, it is clear that 
lim sup,, £, and lim inf, £,, are then also measurable. 


Lemma 3.13 (Borel-Cantelli’s Lemma I) 
If So, w(En) < 00, then p(lim sup,, E,) = 0. 


Proof. Write F,, = U>~,, En and F = limsup, E,. Then F,, | F’. Since 


Sher E,, + F;{ it follows from the continuity of measures (from below) and the 
hypothesis that 


w(Fy) = lim u() Bn) < lim S79 (En) = D9 n(Bn) < 00. 


Hence the continuity of measures (from above) and the hypothesis imply that 
w(F) = lim p(Fyn) < S> p(E,) =0. 


O 


The Borel-Cantelli Lemma is an important tool, in particular in probability 
theory. 
Example. (The doubling strategy.) 

Assume that (X,.M, P) is a probability space and suppose that F, F2,... are 
events such that P(F,,) = 2~", n = 1,2,.... Then by the Borel-Cantelli Lemma, 


P(lim sup E,,) = P(E, 1.0.) = 0. 


One way to describe this in words is the following. Suppose we play a sequence 
of games such that at the n’th game we win one c.u. with probability 1 — 2~” and 
lose 2” — 1 c.u. with probability 2~". Each game is fair in terms of expectation, 
but by the Borel-Cantelli Lemma, we will almost surely lose money only finitely 
many times. Hence, over the whole infinite sequence of games, we will almost 
surely win an infinite amount of money. (In practice this strategy fails, of course, 
since there are always some bounds that will set things up, e.g. one can only play 
a certain number of games in a lifetime.) O 
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3.7 Carathéodory’s Extension Theorem 

A set function pu* : P(X) — [0, co] is said to be an outer measure if 
e u*(0) = 9, 
e u*(E) < p*(F) whenever FE C F, 
@ (Ue, En) < O72, w*(E,) for all sets EF), Fo, .... 


If 4.* is an outer measure, then we say that a set A € P(X) is p*-measurable if, 
for all EF € P(X), 


uw (B) = w(EN A) + w(EN A’). 


By the definition of outer measure, it is immediate that the left hand side is 
bounded by the right hand side, so to prove that a given set A is j.*-measurable, 
it suffices to show that u*(E) > w*(EO A) + w*(E 2 A’°) for arbitrary E with 
jv (BE) <= 06; 


Theorem 3.14 (Carathéodory’s Extension Theorem) 

Let A be an algebra on X and let [iy : A — |0, 00] be a countably additive 
set function. Then there exists a measure jt on o(A) such that (A) = po(A) for 
all A € A. If {tg(X) < 00, then ju is the unique such measure. 


The uniqueness part follows immediately from Theorem 3.10. The existence 
part will be proved via a sequence of claims. These will also reveal some other 
useful facts, apart from the statement of the theorem. 

Claim I. Let j:* be an outer measure and let M be the collection of j1*-measurable 
sets. Then M is a o-algebra. Moreover, the restriction of ,.* to M is a complete 
measure. 

Proof. It is obvious that X € M. From the symmetry between A and A‘ 
in the definition of j.*-measurability, it is also obvious that M is closed under 
complements. It remains to show that M is closed under countable unions. 

Suppose that A, B € M and let E be an arbitrary subset of X. Then AU B € 
M since 


wb) = w(BNAyt+ pe (EN A’) 
w(ENANB)+ we (EO AN B*) 4+ (EN ASN B) + (EN ASN B?*) 
= w(EN(AUB))+u*(EN (AUB) 
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where the last inequality follows from that AUB = (ANB)U(ANB*)U(ANB), 
so that the definition of outer measure implies that the first three terms in the 
middle expression bound the first term in the last expression, and that (AM B)*° = 
A°M B°. Moreover, if AN B = 0, then (AU B)N A= Aand (AUB)NAC= B, 
so the applying the definition of ;.*-measurability of A with FE = AU B gives 


p*(AUB) = p'(A) + p(B). 


In summary ™M is closed under finite unions and ju* is additive on M. 

Now suppose that A; € M, j = 1,2,... are disjoint sets. Write B,, =U! A; 
and B = UJ}? A;. Let EF be an arbitrary subset of X. By the ju*-measurability of 
An, 


W(EOB,) = wW(ENB,N An) + Ww (EN B,N Aj) 
W(E OA) + Ww (E NO Br-1) 


so by induction it follows that 


n 


p(EO By) = So a(EN A). 


1 


Above, we proved that M is closed under finite unions, so B,, € M for each n. 
Hence 


u(EO Br) + (EN Be) = So (EB 0A;) + w(EN Be) 
1 


Sle NA;) + w(EN B*). 


L 


u"(B) 


IV 


Letting n — oo and using the definition of outer measure, it follows that 


(oe) 


w(E) > So ut(Aj) + w'(EN B) > ot (LEN Aj) +e(EN BY) 
p'(ENB) + "(EN BY) > p(B). 


Hence all the inequalities must be equalities and it follows that B € M. This 
proves that M is closed under disjoint countable unions and it is an easy exercise 
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to show that this entails that / is closed under arbitrary countable unions, i.e. M 
is a o-algebra. Moreover, taking L' = B gives 


H(B) = do u*(Ay) 


proving that the restriction of .* to M is a measure. It remains to prove com- 
pleteness. Assume that V € M, y*(N) = 0 and A C N. Then pi*(A) = 0 by the 
definition of outer measure. Therefore 


w(E) < w(ENA)+ Ww (BN A) = (EN A) < w(é) 


proving that A € M. O 


Next assume that jo is a countably additive set function on the algebra A. 
Define p* : P(X) — [0, co] by 


p"(E) = inf{ > po(A;) : Ay € A, LJ Ay 2 FE}. (1) 
1 1 


Claim II. ju* is an outer measure. 

Proof. It is trivial that *(0) = 0 and FE C F => y*(E) < p*(F). It remains 
to prove countable subadditivity. Fix « > 0. If E; €¢ P(X), 7 = 1,2,..., then 
for each j one can find A;(k) € A, k = 1,2,... so that U, Aj(k) D E; and 
3, Mol Aj(h)) < w*(E;) + 2-4, Since Uj, Aj(k) DU, Byp we get 


as desired. oO 


For the final two claims, it is assumed that j* is defined by (1) and M is the 
o-algebra of .*-measurable sets. 


Claim IIL. y:*(E) = po(A) for all B € A, 
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Proof. If E € A, take EF, = Aand Fy = E3 = ... = @ in the definition 
of y* to see that u*(E) < po(E). Proving the reverse inequality amounts to 
showing that io(A) < $7, Ho(A;) whenever A; € A and U,A; 2 LE. Let 
B, = EM (An \ Uy ' Aj. Then the B,,’s are disjoint and U,, B, = E. By the 
countable additivity of uo, it follows that 


Ho(E) S- H0o(Bn) —< S| Ho(An)- 


Claim IV. AC M. 


Proof. Pick A € A and arbitrary F C X and « > 0. By the definition of 1*, 
there exist B; € A such that U; B; D EF and >/, o(B;) < p*(E) +. We get, 
by the additivity of jo on A, 


w(E) +e > pol Bj MA) + > | p0(Bi 9 4%) 


Jj Jj 
> w(ENA)+p(EN A’) 


where the last equality follows from the definition of j1*. Oo 


Taken together, these four claims prove Carathéodory’s Theorem. 


3.8 The Lebesgue measure and Lebesgue-Stieltjes measures 


Up to now, we have not seen any concrete examples of non-trivial measures. 
When X is a countable space, X = {1,2 ,...}, then it is easy to construct 
such measures. Take e.g. M = P(X), let {w(x,}92, be any collection of non- 
negative numbers and let j: be defined by (A) = >7,.<.4 w(x). We have also seen 
that for X = (0, 1] and M = P(X), no sensible length measure exists. We are 
now equipped with the tools needed to construct a proper length measure on R. 
Since it is not possible to do this for all subsets, we have to settle for a smaller o- 
algebra. Clearly sets of the form constructed in Section 2 via the axiom of choice, 
are “unnatural” to expect to be able to measure in terms of length. On the other 
hand, any sensible length measure must be able to measure the length of an inter- 
val. If we could also measure the length of any set that can be constructed from a 
countable number of set operations on intervals, then it is difficult enough to come 
up with an example of a set which would not have a length (such as the set A in 
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Section 2) and even harder to motivate why one would even wish to give such a 
set a length if doing so causes problems. This point of view is what we are going 
to adopt. 

Now recall that the Borel o-algebra is the o-algebra generated by all intervals 
and hence, by virtue of being a o-algebra, contains all sets we wish to assign a 
length to. Hence the aim is to construct a length measure on G(R). It turns out to 
be slightly more comfortable to restrict to (0, 1] and B(0, 1]. Having done so, we 
obviously also have length measures on (n,n + 1] for all n € Z by translation and 
can extend to the whole real line by letting, for  € B(R), defining the length of 
E be the sum of the lengths of EM (n,n + 1], n € Z. 

Let X = (0, 1] and let A be the algebra consisting of finite disjoint unions of 
intervals of the type (a,b],0 <a <b < 1. Hence any A € A can be written as 

(aj, bj] for some n € Z4 and the (aj, bj]’s disjoint. Define jo : A — [0, 1] by 


n n 


po(|_J(a;, bil) = 55 (bj — a). 


1 1 


Clearly the length of any set in A must be given by ug(A), so we would like to 
extend jy to a measure on B(0,1] = o(A). By Carathéodory’s Extension The- 
orem, there is a unique such extension, provided that ji is a countably additive 
set function on A. It is trivial that j1o() = 0 and that ju is additive, but count- 
able additivity is not so clear. It must be proved that jup(U° An) = So}? Mo(An) 
whenever Aj, A» are disjoint sets in A and >> € A. Since jug is finitely additive, 
we may assume without loss of generality that the A,,’s and A consist of a single 
interval: A, = (an, b,| and A = (a,b). 
On one hand, by finite additivity, 


jio(A) = po(A \ (_) Ay) + Ho((_J.As) = yo( J As) = S- p0( As) 


for every n, so letting n — oo gives 
H(A) = S| p0(Ay).- 
1 


Now we focus on the reverse inequality. Fix « > 0. The sets (a,, — €2~”, b,,) 
form an open cover of the set compact set [a, b — «| and can hence be reduced to 
a finite subcover (a, — €2-",b,),n =1,...,N. Let c, = a, — €2~” and assume 
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without loss of generality that c, < co < ... < cy (otherwise just reorder). We 
may also assume without loss of generality that b} < bo < ... < by, otherwise 
discard those intervals that are contained in one of the others; this cannot increase 
soy (b; —a,). Then, since b; < aj4; for allt =1,...,N —1, 


N N oo 
b—e-ax<by—4 <> (bh; -¢) < } (bj - a; +2) < < $0 (0; - a;) + 
1 1 1 
Hence 
}o(A) ~b-as) yl )+ 2e. 


This establishes that io is countably iaive: 

Hence jig extends to a unique length measure 4 on B(0, 1]. This measure is 
known as the Lebesgue measure and the notation we will use for it is m. Looking 
back on the proof of Carathéodory’s Extension Theorem, we find that for sets 
E € B(0, 1| that are not in A, m(£) is explicitly expressed in terms of j19 by 


y*(B) =inf{)~ po(An) : An € A, ) An 2 E} (2) 
n 1 


and m the restriction of the outer measure ju* to B(0, 1]. Moreover, we recall that 
[Mo actually extends to a complete measure on the c-algebra M of ju*-measurable 
sets. This c-algebra contains A and hence B(0, 1], but nothing says that it could 
not be larger. Indeed, it turns out that M equals the completion of 6(0, 1] with 
respect to m and that this o-algebra is strictly larger than the Borel o-algebra. The 
larger o-algebra M is called the Lebesgue o-algebra, denoted £(0, 1]. Since this 
extension comes at no extra cost, it will be assumed throughout that the Lebesgue 
measure is the complete measure defined on £(0, 1], unless otherwise stated. 

The construction of the Lebesgue measure can easily be generalized in the 
following way. Let F’ : R — R be a non-decreasing right-continuous function. 
Redefine the {49 above by 


jor (\_) Ay) = $°(F(b;) — F(a;)). 


An analogous argument shows that j19 is countably additive on A and hence ex- 
tends to a unique measure ji on B(IR). For sets EF € B(R) \ A, (2) becomes 


= inf{S~ por(An): An € A, (J An 2 E} (3) 
n 1 
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and ju the restriction of jj, to B(IR). As for the Lebesgue measure, the o-algebra 
MF of u7,-measurable sets is strictly larger than B(R) and the restriction of 17, to 
Mr coincides with the completion of j:7. In analogy with the Lebesque measure, 
we will henceforth take the notation ju 7 to denote this completion unless otherwise 
stated. The measure ju thus constructed is called the Lebesgue-Stieltjes measure 
associated to F’. 


From (3) it follows (exercise!) that a Lebesgue-Stieltjes measure satisfies the 
following regularity properties, called outer regularity and inner regularity respec- 
tively. 


Proposition 3.15 For all E © Mr, 


Lr(E) 


inf{ur(U) : U open, U D E} 
= sup{pp(K): K compact, kK C EF}. 


Another property in the same vein is the following. 


Proposition 3.16 For all E © Mp and « > 0, there exists a set A, which is a 
finite union of open intervals, such that 


pir(AAE) <e. 


3.9 The Cantor Set 


For any x € R, we have m({x}) = 0, so for any countable subset FE C R, 
m(E) = 0. Does the reverse implication also hold? Le. are countable sets the 
only ones to have Lebesgue measure 0? The answer is no. The most well-known 
example is the Cantor set. It is constructed the following way. Let forn = 
1203s! 


1 


p= U ((3/ +1)3-", (37 + 23°"), 


Let C, = [0,1] \ Dj and recursively C,, = Cy-1 \ Dn. Let C = (f° Ch. The set 
C is the Cantor set. 

In words, the process is the following. Start with the closed unit interval with 
the open mid third removed; this is C’,. From the two closed intervals that make up 
C’, remove from each of them the open mid third to get C). Now C, is the union 
of four closed intervals. Remove from each of these the open mid third to get C3, 


17 


etc. The Cantor set is the limiting set of this process. Clearly m(C,,) = (2/3)”, 
so by the continuity of measures m(C) = 0. 

On the other hand, C' has the same cardinality as (0, 1]. To see this, write each 
number x € [0, 1] by its trinary expansion: 


CS > Gn( na” 


n=1 


where a,,(x) € {0, 1,2}. The expansion is unique for all x except those that are of 
the type x = 73°", 7 € Z,, for which one can either choose an expansion ending 
with an infinite sequence of 0’s or one ending with an infinite sequence of 2’s. In 
such cases, we pick the latter expansion. Then 


C = {x € {0,1}: an(x) € {0,2} for each n}. 


Hence, by mapping each 2 to 1, we see that C’ is in a 1-1-correspondence with the 
set of all binary expansions )°° b,2~”, i.e. with (0, 1]. 


4 Measurable functions / random variables 
Let (X,M, 1) be a measure space and let (Y,.V) be a measurable space. 


Definition 4.1 A function f : X — Y is said to be (M,N )-measurable if 
f(A) EM forall AEN. 


So f is (M,N)-measurable if {x € X : f(a) € A} is M-measurable when- 
ever A is N’-measurable. In words, this could be phrased as that f is measurable 
if statements that ’make sense” in terms of the values of f also ’make sense” in 
terms of the values of x. See the probabilistic interpretation of this in the example 
below. 

When one of the o-algebras is understood, we may speak of f as simply M- 
measurable or \V-measurable and if M/ and NV are both understood, we may speak 
of f as simply measurable. If (X,M,) is a probability space, an (M,N)- 
measurable function is usually called a (Y -valued) random variable. 

Example. Let (X,M,P) bea probability space and suppose Y = (R, B(R)). Let 
€ : X — R bea random variable. This means that € is a (M, B(R))-measurable 
function, i.e. 


&"(B)={2eX: ba) e BLEM 
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whenever B € B(R). Hence P(€~'(B)) = P(E € B) is defined for all Borel sets 
B. \.e. measurability means that it makes sense to speak of the probability that € 
belongs to B for any given Borel set B. O 


Clearly the composition of two measurable functions is measurable. More 
specifically, if (Z,O) is a third measurable space, f : X — Y is (M,N)- 
measurable and g : Y — Z is (NV, O)-measurable, then, since (g 0 f)~1(A) = 
f-\(g-\(A)), 9° f is (M, ©O)-measurable. 

The following result is an indispensable tool for proving that a given function 
is measurable. 


Theorem 4.2 Let € be a class of subsets of Y and assume that N = o(€). Then 
f : X — Y is measurable if and only if f~'(A) € M forall A € €. 


Proof. The only if direction is trivial. Let F = {A ¢ N : f7'(A) © M}. 
Since F D €, it suffices to show that F is a g-algebra. The key is then to recall 
that f~' commutes as an operator with the basic set operations, i.e. f~'(A‘°) = 
f7'(A)* and 


f(a] = LJ f-*(Aa), f(( 40) = ()f71 (Aa) 
for all A and A, and a ranging over arbitrary index sets. Hence 
e X =f -'(Y) and X € M (since M is a o-algebra), soY € F, 
eAcFaft(AeMasa fil AveMs fray eMs AEF, 


0A, € F,n = 1,2,... > ft(Ar) € M => Uf (An) € M = 
ft (U An EM=U, An € Ff. 


O 


Corollary 4.3 If X and Y are topological spaces and M and N are the Borel 
o-algebras, then any continuous function is measurable. 


Proof. Let f be continuous and let 7 be the topology (i.e. the family of open 
sets) of Y. By the definition of continuity, f~'(U) is open for all U € T and hence 
measurable by the definition of the Borel c-algebra on X. Since B(Y) = o(T) 
an application of Theorem 4.2 with € = J gives the result. Oo 
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Corollary 4.4 A map f : X — R is (Borel)-measurable in either of the following 
cases 


e f-'|-oo,a] € M forallae R 


Go 


e f-'loo,a) € M for alla € R, 


e f-'[a,co] € M foralla €1 


ag 


Al 


e f-'(a,oo] € M foralla € 


Since either of the four classes generate B(R), the proofs follow on mimick- 
ing the proof of Corollary 4.3. Of course analogous statements are valid if R is 
replaced with R, R, or R,. 


Example. Let X be the sample space of a random experiment. Then € : X > R 
is a random variable iff {€ < a} is an event for alla € R. This is sometimes 
taken as the definition of a random variable in courses which want to present the 
necessary fundamentals without involving unnecessary measure-theoretic detail. 


Theorem 4.5 Let f,g : X — R be measurable and \ € Ra constant. Then 
f +g, Af and fg are all measurable functions. The same is true for 1/f provided 
that f(x) > Oforalla € X. 


Proof. We do f + g and leave the other cases as exercises. By Corollary 4.4 it 
suffices to show that {x : f(x) + g(x) < a} € M for all a € R. However 


fw: f(x) + (x) <a} =) (fe: fe) <a} {x2 9(z) <a-g}) EM 


qEQ 


since Q is countable and f and g are measurable. oO 


Theorem 4.6 Assume that f;, f2,...are measurable. Then sup,, fn, fn fn, limsup,, fn 
and liminf,, f, are measurable. Moreover, the set {x : lim fn(x), exists} is mea- 
surable and if lim, f;,(x) exists for all x, then lim,, f(x) is a measurable function. 
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Proof. That sup,, fn is measurable follows from the observation that {x : 
sup, frn(z) < a} = f),{v : fr(x) < a}, a countable union of measurable 
sets. Since constant functions are trivially measurable, we get that inf, f,, = 0 — 
sup,,(—f;n) is measurable. Since lim sup,, f, = infm sup, ,, fn and lim inf, f, = 
SUP,» INfnsm fn» these are then also measurable. If lim,, falz) exists for all x, then 
lim, f, = liminf, f, = limsup,, f, and is hence measurable. Finally 


ech lim fr(x) exists} = {x : limsup(x) — lim inf (2) Oy 


is measurable by Theorem 4.5 (since {0} € B(R)). Oo 


Example. Construction of a uniform random variable. 
Let (X,M,P) = ((0,1],8,m) and €(x) = x, x € X. Then € is continuous 
and hence a random variable and 


Meo) See) er Sta <a Se Oa a: 


Example. Construction of a random variable with given distribution. 


Assume that f’ : R — R is non-decreasing and right continuous with 


lita: F(a) 0; lim: F'n) = 


xL—->—Oo wL—->>0o 


We want to construct a random variable € so that P(€é < a) = F(a). Recall the 
Lebesque-Stieltjes measure 47. The conditions on F' imply that ju 7 is a probabil- 
ity measure, so let (X,M,P) = (R, B, wr) and €(z) = x, x € R. Then 


P(E <a) = pr(—00, 4] = F(a). 


An alternative construction is the following, which is most conveniently described 
in the case when F is continuous and strictly increasing. Then F'—! exists, so we 
can take (X,M,P) = ((0, 1], B,m) and €(x) = F~!(z) and get 


P(é <a) =m{a2: F7\(x) < a} = m(0, F(a)| = F(a). 


In the general case, one can replace F'—! with the generalized inverse, which maps 
all points in [F'(x—), F(x+)] to x and points y € {0, 1] for which F~'({y}) is an 
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interval, which must have the form [c, d) or [c, d] since F' is right continuous, to c. 


Example. Construction of a sequence of uniform random variables. 
Again take (X,M,P) = ([0,1],8,m). Represent each x € [0,1] with its 


binary expansion 
C= Sone 
1 


Each a,,(x) is a {0, 1}-valued measurable function of x, since a;,'({1}) is a union 
of 2"~" intervals (of length 2~"). Let {nj}%2,,7 = 1,2,... be disjoint sequences 
and let 


Aen Sag. 
j=l 


Then € is measurable for each 7 by Theorems 4.5 and 4.6 (why do we need them 
both?) and clearly P(€; < a) = @as in the first of the previous examples. Oo 


Example. Construction of a sequence of fair coin flips. 


With the same setting as in the previous example, let simply €;(a) = a;(x). 0 


We end this section with a few notes on completeness. Suppose that g is 
M-measurable and that f = g a.e. If ~ is complete, then this implies that f is 
measurable. However if j is not complete, then this may not be the case. On the 
other hand, by the construction of the completion 77 of ju, it is clear that f is M- 
measurable. Similarly, if ~ is complete, f,, fo,... measurable and f,, > f a.e., 
then f is measurable. (These facts make up Proposition 2.11 in Folland.) Vice 
versa, if f is MM-measurable, then there exists an M-measurable function such 
that f = g fi-a.e. (This last fact is Proposition 2.12 in Folland.) 


4.1 Product-c-algebras and complex measurable functions 


Let (Y,\V) be a measurable space and f : X — Y. Then the o-algebra on X 
generated by f is given by 


o(f) :=of{f (A): AEN}. 


In other words, o(f') is the smallest o-algebra on X that makes f measurable. (In 
fact {f~'(A) : A € N} is ac-algebra (prove this!), so o(f) equals this set.) 
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More generally, if F is a family of functions from X to Y, then 
o(F) :=of{f (A): feF, AEN}. 


Now let (X1, M1) and (X2, M2) be two measurable spaces. The projection maps 
7m and 72 are given by 


m,:X1. X X_ 7 Xj, 1i(11, 22) = 2; 
i= 1,2. 
Definition 4.7 The product c-algebra of M, and Mg is given by 
M, x Mo :=0(m, M2) = of fy x Ey: BE, € Mii = 1,2}. 


More generally 
[] Ma = efter =1,2,...}=of]] Bn: En © Mn} 
1 1 


and for a general index set I 


[[“. = o{f,:a€ I} 


ael 


o{]] E.: Ey € Maand E,, = X,q for all but countably many a}. 


ael 


Make sure that you understand the equalities in the definitions. 


Proposition 4.8 Let (X,M) and (Y,,N.), a € I, be measurable spaces. A map 
h = (folaer : X > [Jue Yo is (M, []o¢;Ma)-measurable if and only if each 
fa is (M, N.,)-measurable. 


Proof. Since f., = Tq ° h, a composition of two measurable maps, the only if 
direction holds. On the other hand, if all f, are measurable, then for any a and 
AEN, 

h-"(mg'(A)) = (ta 0 h)"(A) = fa" (A) © M.- 


Since [],, Na is generated by 7,, a € I, the if direction now follows from Theo- 
rem 4.2. 
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Proposition 4.9 B(IR?) = B(R) x B(R). 


Proof. Let A = {(a1, b1) X (a2, b2) : a1, b1, a2, bg € Q}. Since any open set in 
IR? can be written as a countable union of sets in A, we have B(R?) = o(A). By 
definition B(R) x B(R) contains A and hence B(R) x B(R) 2 B(R?). 

On the other hand, B(R) x B(R) is generated by 7~'(A), A € B(R),i = 1,2. 
We have 7, '(A) = A x R, so it suffices to show that A x R € B(R?) for every 
A € B(R). (The similar statement for 72 is of course analogous.) Since A x R is 
open in R? whenever A is open in R, this holds for all open A. Hence, the family 
{A € B(R) : A x R € B(R’)} contains all open sets, so if we can show that it is 
also a o-algebra, we are done. This, however, is obvious. oO 


Two immediate corollaries follow. 


Corollary 4.10 B(C) = B(R) x B(R). 


Corollary 4.11 A function f : X — C is (M,B(C))-measurable if and only if 
Rf and Sf are both measurable. 


4.2 Independent random variables 
In the next sections (X, M, P) will be a probability space. 
Definition 4.12 Let I be an arbitrary set and let Ey, a € I, be subclasses of M. 
e We say that {E.}aer is independent if 
P(() Ei) = |] P(®) 
jed jeJ 


for all finite J € I and all E; € €;, 9 € J. 


e The family of random variables {£&..} er said to be independent if {0 (£..) baer 
is independent. 


e The family of events {E.}acr, is said to be independent if {x 5, }acr is in- 
dependent. 


The given definition is completely general in terms of the index set J. Al- 
though having J uncountable can be useful sometimes, e.g. when defining Gaus- 
sian white noise, it will not be so here, so in the sequel J will be either finite or 
countably infinite. 
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Lemma 4.13 Assume that T, J C M are two x-systems and let N = o(Z) and 
O =0(J). Then {N, O} is independent if and only if {Z, J} is independent. 


Proof. The only if direction is trivial. The if direction will be proved by a 
two-step procedure. First fix arbitrary J € Z and define two measures on O by, 
for each B € O, setting 

jn (B) = P(IN B) 

f2(B) = PU)P(B) 
By hypothesis jz; and juz agree on J and j4)(X) = fo(X) < 1 < ov, so by the 
Uniqueness Theorem for measures, ji; = /l2. Next fix arbitrary B € O and define 
two measures on N by setting, for each A € N, 


yis(A) = P(AN B) 
yu(A) = P(A)P(B). 


By what we just proved, ju3 and ju4 agree on Z. They are also finite and agree on 
X, and are hence equal. This proves independence. Oo 


Clearly Lemma 4.13 extends to all finite collections of 7-systems and their 
generated o-algebras. Since independence of an infinite family of o-algebras is 
equivalent to independence of finite subfamilies, Lemma 4.13 also extends to: 


Corollary 4.14 Let Z,,Zo,... C M be m-systems. If {Z,, Z2, ...} is independent, 
then also {o(Z,), 0(Z2),...} is independent. 


The following two examples are important. First observe the following useful 
fact. Let f : X — (Y,N) and suppose that € C P(Y) generates NV’. Then 
{f-!(E) : E © €} generates o(f); this is so since {LE CY: f-'(E£) € o(f)} is 
a o-algebra, by the commutativity of inverse images and basic set operations. 
Example. Let € and 77 be two random variables. Then {€~!(—oo, a] : a € R} 
and {7~!(—oo, b] : b € R} are 7-systems and generate o(£) and o(7) respectively. 
Hence by Lemma 4.13 {€, 7} is independent iff P(€~!(—oco, a] N77 !(—o0, b]) = 
P(€—!(—oo, a])P(n7!(—o«, b]) for all a, b, ie. if 


P(E <a, <b) = P(E < a)P(y < b) 


for all a,b € R. More generally, by Corollary 4.14, {&), €o,...} is independent iff 


n 


P(E, Sa1,---, 6, S Qn) = I] P(Ei, < ax) 


k=1 
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for alln = 1,2,...,all1 <7, <<... <7, andallaj,...,a, € R. Oo 


For trivial reasons, { f(£), g(7)} are independent whenever € and 77 are inde- 


pendent. (Check that you understand why!). Analogously, if {{&, £2, ...}, {71, "2, -- 


is an independent pair of families of random variables (i.e. an independent pair 
of R°-valued random variables; there is nothing in the above definitions that 
prevents us from considering random variables taking on values in an arbitrary 
space), then f (£1, €o,...) and g(71, 72, ...) are independent. 

It is intuitively clear that if {£,, €,...} is independent, then, if we extract two 
disjoint subfamilies, these two should make an independent pair of R°-valued 
random variables. The next example shows that this is indeed the case. 


Example. Let €), €2,... be independent random variables and let J and J be two 
disjoint index sets (i.e. J, J C Nand IO J = 0). Then {{€;, < ay,...,{&, < 
An}: n= 1,2,..., 4, <<... < tn, Q1,..-,Gn € R} is a 7-system that generates 
o(€; : 7 € I) and the analogous 7-system generates o(€; : j € J). 

By the previous example, the two 7-systems are independent. Hence the col- 
lections (€; : i € I) and (€; : j € J) are independent, by Corollary 4.14. 


To relax our language a bit, let us take the statement ’€,, 2,... are indepen- 
dent” to mean that the family {€1, &,...} is independent. Note that it is actually 
important to spell this out, since another interpretation of the statement could have 
been that the random variables are all pairwise independent. This, however, is a 
much weaker statement. Consider for example the three {0, 1}-valued random 
variables £1, £2, €3 given by P({, = 0, = 0,3 = 1) = P({1 = 0, = 1, &3 
0) = P(é, = 1,2 = 0,3 = 0) = P(, = 1,2 = 1,3 = 1) = 1/4, which are 
pairwise independent, but clearly not independent since any of them is the xor sum 
of the other two. Hence, in the sequel, saying that a set of random variables are in- 
dependent means something stronger than saying that the same random variables 
are pairwise independent. 


Theorem 4.15 (Borel-Cantelli’s Second Lemma) 
Let E, Ex,... be a sequence of independent events. If )>*° P(E,,) = 00, then 
P(lim sup, £,) = 1. 


Proof. Note that 


(lim sp By = (() U Ly = U () Er 


mn>m mn>m 
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so by the continuity of measures, it suffices to show that P((),,..,,, /,) = 0 for all 
m. This in turn follows from the following computations 


aCe 


n>m 


lim P(() = im] P( Er) 


Example. Let €;, £2,... be independent random variables with exponential(1) 
distribution, i.e. 


P(E>a)=e*, 22> 0. 
Then 


Hence 57° P(E, > alogn) is finite for a > 1 and infinite fora < 1. By the 
Borel-Cantelli Lemmas, this entails that 


e if a < 1, then almost surely €,, > alogn for infinitely many n, 


e if a > 1, then almost surely €,, > alogn for only finitely many n. 


4.3, Kolmogorov’s 0-1-law 


Let £1, €9,... be independent random variables. For each n, let 


TS = O(En+1) Gnas on .) 


and 
Tl \ Ts 
The o-algebra 7 is called the tail-o-algebra (w.r.t. &,&,...). A set E € T is 


called a tail event and a random variable which is 7 -measurable is called a tail 
function of the €,,’s. 
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A tail event does not, for any n, depend on the first n of the €;,’s, so at a 
first glance it may seem that 7 should be trivial. This, however, would be the 
wrong impression, since 7 actually contains a lot of interesting events. E.g. the 
event {x € X : lim, €,(z) exists} is a tail event and 7 = limsup,(4) 77 & 
is a tail function; they are 7,,-measurable of every n and hence 7 -measurable. 
Kolmogorov’s 0-1-law states that the probability for a tail event must be either 0 
or | and that any tail function must be a constant a.s. 


Theorem 4.16 (Kolmogorov’s 0-1-law) 
Let &), &9,... be independent random variables. 


(i) IfE €T, then P(E) € {0,1}, 


(ii) If is T -measurable, then there exists a constant c € R such that n = c a.s. 


Proof. 


(i) Let F,, = 0(&1,.--,€.),n = 1,2,.... By the above example, F,, and 7, 
are independent. Since 7 C 7, F,, and 7 are independent for every n. 
Hence |), F, and 7 are independent. Since L,, F;, is a 7-system, it fol- 
lows that o((U,, Fn) and T are independent. However T C o{&1, £9,...) = 
o(U,, Fn), so T is independent of itself. This means that for each E € T, 


P(E) = P(E E) = P(E)’ 


which entails that P(£) is either 0 or 1. 


(ii) For alla € R, P(n < a) € {0,1} by Gi). Let c = inf{a : P(n < a) = I}. 
Then 1 
> <0) = PCT Me n(x) S¢+—}) =1 
and 


Example. (Monkey typing Shakespeare) 
Suppose that a monkey is typing uniform random keys on a laptop. There are, 
say, N keys on the laptop. The collected works of Shakespeare (to be abbr. CWS) 
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comprises, say, |Z symbols. Let £ be the event that the monkey happens to type 
CWS eventually. Will £ occur? 

If we let F’ be the event that the monkey types CWS infinitely many times, then 
by Kolmogorov’s 0-1-law, P(F’) is 0 or 1. Let F;, be the event that the monkey 
types CWS with the nV + 1’th to (n + 1)M’th symols it types. Then P(F;,) = 
1/N™, so >>, P(F,) = oo and hence P(limsup, F;,) = 1 by Borel-Cantelli. 
Hence 


P(E) > P(F) > P(limsup F,,) = 1. 


So the answer is yes, the monkey will eventually type CWS (but, of course, very 
much provided that it has an infinite life and can be persuaded to spend an infinite 
amount of time at the laptop). O 


Note that they key in the example was really Borel-Cantelli’s Second Lemma 
and that the information provided by Kolmogorov’s 0-1-law was only that P(F’) € 
{0,1}. In the next example, the 0-1-law plays a more vital role. 


Example. (Percolation) 

Consider the two-dimensional integer lattice, i.e. the graph obtained by plac- 
ing a vertex at each integer point (n,k) in the Euclidean plane and placing an 
edge between (n,k) and (m,7) if either nm = m and |k — j| = l ork = j and 
|r — m| = 1. Now remove edges at random by letting each edge be kept (or 
open!) with probability p and removed (or closed) with probability 1 — p, in- 
dependently of other edges. The resulting random graph will of course a.s. fall 
into (infinitely many) connected components. However, will there be an infinitely 
large connected component? 

Let E be the event that an infinite connected component exists. Let &; be 
the status, i.e. kept or removed, of edge number 7; here assume that edges are 
numbered according to their distance from the origin and arbitrarily among those 
edges that are equally far away. Now observe that F is a tail event. This is so since 
the presence or absence of infinite components cannot be changed by changing the 
status of the first n edges no matter the value of n. (For an outcome where infinite 
components exist, changing a finite number of edges can change the number of 
such components, but never change presence/absence.) Hence, by Kolmogorov’s 
0-1-law, P(£) is 0 or 1. 

Determining for what p we have P(E’) = 0 and for what p we have P(E) = 1 


'Percolation theory has its origins in the study of water flow through porous materials. The 
edges then represent microscopic channels which may or may not be open for water flow. 
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is a different story. This is of course a general fact about applications of Kol- 
mogorov’s 0-1-law; it tells us that a tail event has probability 0 or 1, but never 
tells which it is. However, knowing that P(£) is 0 or 1 is still very helpful since 
if we can also show that P(E’) > 0, then it follows immediately that P(Z) = 1. 
In the percolation setting of this example, consider the probability that no 
vertex in the 2n x 2n-box centered at the origin, is part of an infinite path of kept 
edges. It can be shown that this probability is bounded by n(3(1 — p))”. (This is 
done by bounding the number of ways that the box can be cut off from infinity”’.) 
This is less than 1 for large enough n if p > 2/3. Hence P(/) = 1 for p > 2/3. 
On the other hand, by similar counting, it is easy to see that P(E’) = 0 for p < 1/3. 
In fact, the critical probability for when P(E’) switches from 0 to 1 is p = 1/2. 
This a central and highly non-trivial fact of percolation theory. (When p = 1/2, 
then P(E) = 0.) O 


5 Integration of nonnegative functions 


Defining the Lebesgue integral is a stepwise procedure. It starts with nonnegative 
simple functions. 


Definition 5.1 A function ¢ : (X,M,) > C is said to be simple if it is of the 


form 
n 


oe) = S242, (2) 
1 
for some n, where z; € Cand {E\,...,E,} is a partition of X such that E; © M 
forall j. 


Let L+(X,M, 1) denote the set of all M-measurable functions f : X —> 
[0, co]. Depending on the level of risk for confusion, we often use shorthand 
notations such as L*(X), L+(M) or simply L*. 


Definition 5.2 Let ¢ = >> ajX BE; 4; © Ry be simple. Then the integral of 
with respect to jt is given by 


i b(x)dy(x) = So ajp(B)). 


30 


Example. Let (X,M, jw) = ((0,1],£,m) and @ = xy@qjo,j- Since QN [0, 1] is 
countable, it is measurable, so ¢ is a simple function and { ¢dm = 0. Compare 
this with what happens if we try to calculate the Riemann integral of this function. 
Since the Riemann integral is defined in terms of approximations of @ from above 
and from below by simple functions that are constant intervals, we find that the 
Riemann integral of ¢ is not defined. Thus, there are functions defined on an 
interval of the real line which the Lebesgue integral can handle, but which the 
Riemann integral cannot. Later, we will also see that any Riemann integrable 
function on an interval is Lebesgue integrable and that for such functions, the two 
methods give the same result. O 


Alternative and/or shorthand notations for the integral are [, ¢(a) (dx), f ody 
and [ ¢. The representation of a simple function as a finite linear combination of 
characteristic functions is of course not unique, but it is easy to see that different 
representations give the same result, so the integral is well-defined. For A € M, 


write 
[ods f oxady 
A xX 


This is well-defined since ¢y4 = a ajX Ang; + 0- Xa, is simple. A few basic 
facts follow. 


Proposition 5.3 Let c € Ry and ¢ = Sy ajXn,, = dy’ bjxr, € L* be simple 
functions. Then 


(a) fedb=cJ ¢, 

(b) [(o+d)=fo+fy 

(chosvsofosfy 

(d) The map A + ty o, A € M, is a measure. 
Proof. Part (a) is trivial. For part (b) observe that 


b+0= 35S (a + ds)xEnK,- 
tj 


Hence 


[e+e 


S dla + b;) uC Bi F5) = a Qi DME; N Fy) + De bj » w( Ei 0 £5) 
Sain B) + 7 byl) = if fen / . 
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For part (c) use the representations ¢ = 5°), >> j uxE; OF; andy = ee> 5 OiXEGN Fj. 
On each £; F; we have a; < b;, so the result follows immediately from the def- 
inition. 

To prove part (d), it must be shown that if A,, Ao,... are disjoint sets in M, 
then Lien oe ha @. We have 


o— 
aS 
I 
M 
Q 
os 
“—~ 
=) 
=C 
Z 
ae 
| 
ts 
rs) 
Q 
= 
& 
) 
= 


kai Ak j=l j=l k=1 
= SYLanenay= > fo 
k=1 j=1 ka] 2 Ak 
where the second equality is countable additivity of ju. O 


The next step is to define integrals of arbitrary functions in L*+ by approxi- 
mating with simple functions. The following approximation result tells us that it 
makes sense to do so. 


Theorem 5.4 (a) Let f € L*. There are simple functions 6, € L* such that 
On(x) t f(x) for every x € X. 


(b) Let f : X — C be measurable. Then there are simple functions @,, such 
that |@1| < |¢2| <... <|f| and é, > f pointwise. 


Proof, In (a), let Aj = {x : f(x) € [j2-", G+1)2-")}, j = 0,...,n2"—1 and 
let Ajon = {x : f(x) > n}. Since f is measurable, all these sets are measurable, 
so letting 


n2” 


n(x) = 5° j2°-"Xa, (a) 


gives @,,’s of the desired form. 
For (b), apply the proof of (a) to all four parts of f; see below for definitions. 


In the light of Theorem 5.4, we make the following definition. 


Definition 5.5 Let f € Lt. Then 
| f(x)dqu(x) == sup { | (a)du(x) :0<b< f, dsimple}. 
xX xX 
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ForA eM, 


i fa = | fradn. 


It is obvious that ifc € R,, then [cf =c/f f andif f < g, f,g € L*, then 
J f < fg. The next result is one of the key results in integration theory. 


Theorem 5.6 (The Monotone Convergence Theorem) 
Assume that fn, f € L* and f, + f pointwise. Then f[ frdut f fd. 


Proof. Since {f,,} is increasing, {f f,} is increasing and hence lim, [ f, 
exists (but may be equal to oo). Since f, < f forall n, lim, f fn, < [ f. 

Now pick an arbitrary simple function ¢ € L* such that ¢ < f and an arbi- 
trary a € (0,1). Since f, + f pointwise, the sets A, := {x: f,(x) > ad(x)} are 
increasing inn and), A, = X. Since the map A — f,, dis a measure, it follows 
from the continuity of measures that [ 4,?t J ¢. Therefore 


tim ff, > atimine [ o=a fo. 


Since a was arbitrary, letting a ¢ 1 entails that lim, | f, > [ ¢. The result now 
follows from the definition of [ f. Oo 


The first consequence of the MCT is that the integral is additive. Indeed, it is 
in fact countably additive: 


Theorem 5.7 Let f,, © L*. Then 


[Oo man= >> f te 


Proof. First consider finite additivity. By Theorem 5.4, there are simple non- 
negative functions @,, and w,, such that ¢, t f; and w, + fo pointwise. By the 
MCT and Proposition 5.3, 


[f+ fa) lim [660+ Ue) =tim fbn + tim fin = fiti+ f fe 


Now finite additivity follows by induction. Since S~)’ fn t 323° fn as N > 00, 
another application of the MCT now shows that 


[oon ip [oh iid f=: f fn 
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Corollary 5.8 Let f € L*. Then the map A > iz fdp, A € M, is a measure. 


The hypothesis in the MCT is that f,, + f pointwise. This can be relaxed a bit; 
it suffices to have f, t f a.e. To see this, first observe that if [ f = 0, then we 
can find simple ¢, € Lt with ¢, t f pointwise and [ ¢, = 0. However, since 
dn is simple, this trivially means that ¢, = 0 a.e. Now if x is a point such that 
f(x) > 0, then ¢,,(x) > 0 for all sufficiently large n. Hence 


pfx: f(x) > 0} < n( Uf: en(e) > OF) =0. 


In summary 
Proposition 5.9 Let f € L*. Then | fdy = 0 ifand only if f =0 ae. 


Suppose now that f, + f ae. and let E = {x : f,(x) > f(ax)}. Then 
fnXe — fXzE pointwise so by the MCT, [ frxe > f fxm. Since f — fxr € Lt 
and f — fxz = 0 a.e., Proposition 5.9 implies that [ fyz = | f. From the same 
argument, | f,xe = | fn. Putting these facts together gives [ f, > | f. (This 
result is Corollary 2.17 in Folland.) 

The MCT states that if f, € Lt and f, + f ae., then [ f, > f f, but 
what about when f,, — f without being increasing in n? Does this also imply 
J fn — J f? The answer is no, as the following example shows. Let (X,M, 1) = 
(0, 1], £,m) and f,(%) = nxjo1/nj(v). Then f, — 0 ae. (but not pointwise, 
since f,,(0) — oo), but [ f;, = 1 for every n. 

Hence some further assumption is needed to guarantee that f,, > f a.e. entails 
that [ f, — | f. Such a condition will be given in the Dominated Convergence 
Theorem below. Before that, we will extend the integral from nonnegative real 
functions to general complex functions. First however, we finish the present sec- 
tion with the important Fatou’s Lemma and a note on o-finiteness. 


Theorem 5.10 (Fatou’s Lemma) /f f,, € L*,n = 1,2,..., then 


[im inf f,)du < lim int [ falihe 
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Proof. Note that inf,>, fn is increasing in m, so by the MCT, 


[imine f = fimotin inf fr) =lim f inf re 
lim inf [to =limint |p, 


where the inequality follows from that inf, fn < fn, and hence i INL i Syje S 


IA 


J fn» for every n > m. o 


An immediate consequence of Fatou’s Lemma is that J fdp < lim inf,, i, Fadl 
whenever f,, € Lt and f, > f ae. 

The final result of this section makes the observation that if f € L* and 
| f < ©, then p{ax : f(x) < co} = 0, which is obvious, and that j1 can always 
be regarded to be o-finite as far as f is concerned: {x : f(x) > 0} = Uta: 
f(x) > 1/n} is o-finite. This is stated in Folland as Proposition 2.20. The last 
result extends to the conclusion that U,{x : f,(a) > 0} is o-finite whenever 
J frdp < co for all n. 


6 Integration of complex functions 


Consider a function f : (X,M, 1) > (IR, B(R)). Define the two functions f* 
and f~ by 


f* (x) = max(f(x),0) 
and 
f(@) = f*(e) — f(z) = — min(0, f(z). 
These two nonnegative functions are called the positive part and negative part of 
f respectively. 


Definition 6.1 A function f : (X,M,)) — (R,B) is said to be integrable if 
| ftdu < cand f f~dpu < ov. The integral of an integrable function f is given 


by 
[tana [pane f raw 


A function f : (X,M, 4) > (C, B) is said to be integrable if Rf and Sf are both 
integrable, and the integral of f is then given by 


[tana [ena +i fenap 


35 


The integral of a complex function is well-defined since measurability of f is 
equivalent to measurability of its real and imaginary parts, by Corollary 4.11. It is 
easy to see that the integral is linear and that f is integrable iff [ |f| < co. 


By L1(X,M, ) we will mean the space of all integrable complex functions 
on X. Simplified notations are L'(X), L1(M), L1() or just L’ when these can 
be used without risk of confusion. The space L! is, as we just observed, a complex 
vector space. 


Proposition 6.2 For any f € L', 


Lf tau| < f itlaw. 


Proof. For real-valued f, this is just the ordinary triangle inequality: 


Lf sl=[fat-fels fate f= fia 


For the general case, represent complex numbers z as z = |z|sgnz. Then | [ f| = 
af f, where a = sgn(J f). Thus 


[ft] = for=xfar= fran 
frenis fleri=tal firi= fel 


IA 


Proposition 6.3 Let f,g € L'. Then 
(a) {x : f(x) 4 0} is o-finite, 
(b) Jo f=J,gforalE eM iff J |f —gldu=0if f=gae. 


Proof. Part (a) is the corresponding result for nonnegative functions applied 
to the four parts of f. We also saw in the previous section that [| f — g| = 0 iff 
|f —g| = 0 a.e., so the second equivalence in (b) holds. For the if direction in (b): 
if { | f — g| =0, then for each E € M, 


[[t-fo-|fu-os fur-si-o 
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For the only if direction: Assume for contradiction that [ nlf — 9) = 0 for all 
E € Mand p{|f — g| > 0} > 0. Writing f — g = u+ iv, we must then have 
that at least one of the four parts w+, w-, vt and v~ is nonzero on a set of positive 
measure. Assume without loss of generality that this holds for ut, so that with 
p{a:ut(x) > 0} > 0. Then with n sufficiently large and EF = {x : ut(x) > 
1/n} has u(E) > 0. Then, since u= = 0 on E, 


wf (f 9) = ule) > 0 


a contradiction. oO 


Remark. In the notation L' for the space of integrable functions, it is usually 
understood that the space is normed with the L'-norm given by 


fle = f It olde 


There is actually a slight problem with this, since ||f — g||; = 0 only implies 
that f = g a.e. and not that f and g are identical functions. This is solved by 
defining equivalence classes of integrable functions by saying that f and g belong 
to the same equivalence class if they are equal a.e. Then these equivalence classes 
are formally taken to be the elements in L'. Then a particular function f is not 
really an element of L', but rather a representative of its equivalence class. This 
distinction, however, will not cause any problems in this course. 


Theorem 6.4 (The Dominated Convergence Theorem) 
Assume that f;, fz,... € L' and fy, — f a.e. Assume also that there exists an 
integrable g € L* such that | f,| < g for every n. Then 


/ frdpi > / fd. 


Strictly speaking, that f,, + f a.e. does not imply that f is measurable. If pu is 
complete, then measurability of f follows. If not, the at least f will be measurable 
after an alternation on a null set. Let us suppress this concern and simply assume 
that f is measurable. 

Proof. Assume without loss of generality that the f,,’s and f are real-valued. 
Then g + f, and g — f, are nonnegative by assumption. Hence Fatou’s Lemma 
gives on one hand 


for fra [ors siimint [(o+f.)= fg+timint ff, 
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and on the other 


fo-frq fe-asiimint [(o-f.) = fg timsup f fy 


The DCT allows us to prove that the integral is countably additive under the 
right assumption. 


Theorem 6.5 Assume that f, € L' and \\*° [ |fnldpe < 00. Then g := °° | fn 
is integrable and [(S-\° fn)duw = oP f frdu 


Proof. Since = - 
fo= fLiml=¥ fil <o 


by Theorem 5.7, g is integrable and )°>° |f,,(a)| < oo for a.e. a, so that 375° f(x) 
exists for a.e. x. Since }> y |fn| < g for every N, the DCT implies that 


[oom ip [Sh =p) f m=30 fb 


The next result states that the set of simple functions is dense in L?. 


Theorem 6.6 Jf « > 0 and f € L', then there exists a simple function ¢ = 
eT 4jXB;» a; € C, such that 


lf — lh <e. 


If ps is a Lebesgue-Stieltjes measure on R, then the E,’s can be taken to be open 
intervals. Moreover, there exists a continuous function g such that || f — ||, < €. 


Proof. By Theorem 5.4(b), there are simple functions ¢; such that ¢, > f 
pointwise and |¢,| < |f| for every k. Then |¢; — f| < 2|f| so by the DCT, 


[le- slo. 


Now take ¢ = @, for sufficiently large k. 
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Assume now that jz is a Lebesgue-Stieltjes measure and ¢ as in the statement 
of the theorem. We have, if the a;’s are nonzero, 


1 1 
WE) = fel < eo ffl <0. 


Hence, by Proposition 3.16, there exists a set A; which is the finite union of 
open intervals, such that (A;AE;) < €/(mlaj|). Let = SOY ajxa,. Then 
J |¢— | < e. The final assertion follows from that the characteristic function 
X (a,b) Of an open interval can be arbitrarily well approximated by the continuous 
function which is 0 outside (a,b), 1 on [a + 4, b — 6] and linear on the remaining 
pieces. Oo 


Consider two measurable spaces (X,,.M,) and (X2,M2). For sets E € 
My, x Mo, Fix x2 © X2 and define the set F,, = {1 © X1: (41,22) € E}. Let 
F be the family of sets in EF € My x Mg such that F,, € M). Then F contains 
all sets of the form £; x Ey, E; © M,;, by the definition of product-o-algebras. 
It is also easy to see that F is a c-algebra. Hence F = My, x Mo, ie. E,, © Mi 
for every FE € M, x Mog and every x2 € X2. A consequence of this is that if 
f :X,x XY is (M, x Mo, N)-measurable and we let f,, (21) = f (21,22), 
then for B € N, (f5')(B) = f7'(B)as © Mu, ie. fry is (M1, N)-measurable. 
Hence the following statements are well-defined. 


Theorem 6.7 Let a,b € R, a < band let f : X x [a,b] > C be (M x 
Bla, bj, Ae Assume that f(-,t) is integrable for each t © |a, | 
and let F(t) = Jy f(a, t)du(2). 


(a) If there exists ag € L*() such that |f(x,t)| < g(x) for all (x,t) and 


limy i, f(x, t) = f(x, to) for every x, then lim,_,, F(t) = F(t). Conse- 
quently, if f is continuous, then so is F. 


(b) If f is partially differentiable w.r.t. t and there exists a g € L*({1) such that 
|(Of /Ot) (x, t)| < g(x) for all (x,t). Then 


Pt) = | S@,dulo. 


Proof. Pick arbitrary t,, converging to to, let h,(x) = f(x,t,) and h(x) = 
f(a,t) and use the DCT on h,, and h. This gives (a). For (b), let instead h,,(x2) = 
(f(a, tn) — f(x, t))/(tr — t) and h(x) = (Of /Ot)(x,t). Then h, — h pointwise 
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and the result follows on applying the DCT to h,, and h; this can be done since 
|hn(x)| < sup, |(Of /Ot) (ax, t)| < g, by the Mean Value Theorem and the hypoth- 
esis. 


We are now going to see that any Riemann integrable function on a closed 
interval |a, | is also Lebesgue integrable and that the results of the two integrals 
are the same. The setting is thus that (X, M, 1) = ([a, b], £,m). Let f be defined 
on X and bounded. Let P = {to,t1,...,tn},a = to < ti <...,t, = 6, be an 
arbitrary finite set of points in [a, bj. Let 

iy =n (P) = gaat . ft), Mz =MP) = sup? 7); 


joists te[tj-1,t5] 


Spf = )_ mj(t; — tj1), Spf = )) Mj(t; — tj) 
1 t 


and 
I(f) = sup spf, I(f) = inf Spf. 
P 


Then f is said to be Riemann integrable if J(f) = I(f) and He f (x)dzx is defined 
as the common value of the two. 

For a given P, let gp = D0) mjxX(e,_12,) and Gp = 07 Mjxe,_14,)- If f 
is Riemann integrable, there are sets P, such that P, C Py C ... and sp, TF 
Hi f(x)dzx and Sp, | i f(x)dx as k — oo. Since gp, and Gp, are increasing and 
decreasing respectively, there are limiting functions g and G satisfying g < f < 
G. Since gp, and G',, are obviously Lebesgue measurable, so are g and G. By 
the DCT, [ gdm = [ Gdm = f° f(a)de, Hence [(G — g)dm = 0,s0G = g 
a.e. which entails f = G a.e. Since the Lebesgue measure is complete on £, f is 
Lebesgue measurable and we get [ fdm = i f(x)dz. 

These results are summarized in Folland in Theorem 2.28. The results clearly 
extend to improper integrals and to multiple integrals of functions on R”. 


6.1 Expectation 


Let (X,M, P) bea probability space and let € : (X, M,P) — (R,B) be arandom 
variable. If € is integrable, then the expectation of € is 


RE i= ie édP. 
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For A € B, let 


P(A) = P{x : €(x) € A}. 


Then P¢ is a probability measure on B. The next result shows that the expectation 
can be computed by integration with respect to Pe. 


Theorem 6.8 (The law of the unconscious statistician) Let h : R — R be a 
Borel function and assume that h o € is integrable. Then 


ang) = f h(t)ae) 


Proof. Assume first thath = yg fora B € B. Thenho€ = hoyxg = 
X{x:€(x)EB}>» SO 


EA(€) = P{€ € B} = Pe(B) = [ xpd. 


By linearity of integrals, the result now holds for all simple functions h. By the 
MCT, the result then extends to all nonnegative h and finally to all h by linerity. 


A corresponding result can be shown for measurable functions on any o-finite 
space. 


6.2 The Monotone Class Theorem 


The version of the Monotone Class Theorem presented here is slightly different 
from, and more efficient than, the one in Folland. It is an extension of Dynkin’s 
Lemma and will allow us to make conclusions for all measurable functions on the 
basis of the corresponding conclusion for characteristic functions of the sets of a 
generating 7-system. 


Definition 6.9 Let H be a class of functions defined on the space X. Then H is 
said to be a monotone class if 


(i) H is a complex vector space, 
(ii) f=l1sS>fEH 
(iii) fx © H, fn > 0, fn t f, f bounded => f EH. 
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Theorem 6.10 (The Monotone Class Theorem) Let H be a monotone class of 
functions on X. Let I C P(X) be a a-system and assume that x; € H for all 
I € Tf. Then H contains all bounded complex o(LZ)-measurable functions. 


Proof. Let D = {A © o(Z) : v4 © H}. By the conditions on a monotone 
class, D is a d-system. Hence y4 € H for all A € o(Z) by Dynkin’s Lemma. 
Since H is a vector space, H then contains all simple functions. If f is nonnegative 
and o(Z)-measurable, then let ¢,, + f for simple ¢,,. By (iii), f € H. Finally H 
now must contain all bounded o(Z)-measurable functions, by (i). O 


6.3 Product measures 


Let (X71, M4, 41) and (X2, Ma, 2) be two measure spaces. Recall that 
MM, x Mo = o(Z) 


where J = {EF x Ey: E; € M;}. Observe that Z is a 7-system. Let A be the 
family of finite disjoint unions of elements in Z. Then A is an algebra. This is not 
immediately clear, but follows from the observation that (E x F’)° = (E° x X2)U 
(E x F°). Obviously o(A) = M, x Mo. Fora given set U! (Ex x Fr) € A, let 


n 


v(t F) = 2 (Bal). 


1 


Claim. v is countably additive on A. 

Proof. It suffices to show that if F,, x F,, n = 1,2,... are disjoint and 
LU (2, x F,) = Ex F, then v(E x F) = 30 v(E, x F,). We do this in 
two steps. First fix an arbitrary x2 € X». Then 


wa(E\xr(@2) = xelste) [ ¢p(ts)dpe (21) = i RON Cr TNCs 


= Of eserves (addin (er) 


S veu(es) | ve(eidea(er) 
a XF, (22) Hi (En), 
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where the second equality follows from the MCT and that yg(21)xr(v2) = 
>o , XEn(©1)XF, (%2). The second step is now the following computation. 


Due: = i i B\eeleanen 


ya(En) [ xm, (22) dpun( 2) 


n 


do H1(En)H2( Fn); 


where the second equality is the MCT and step 1. O 


By Carathéodory’s Extension Theorem, v extends to a measure 4 on M, x 
Mz. The standard notation for this measure is js = [1 X [e. 

The construction of the product measure obviously extends to a finite product 
of measure spaces. It also works for a countable number of spaces after modifying 
the 7-system Z toZ = {||} E;x|][i > nX;}. This extension is most natural when 
the ju;’s are probability measures. 

Recalling how a measure is constructed from a countably additive set function 
on an algebra via an outer measure, we find that 


(1X pla)(A) = inf {D5 on (Ey) oo( Fj) : Ej € Ma, Fy € Mo}. 


Applying this to the two-dimensional Lebesque measure, some useful analogs of 
approximation results for one-dimensional Lebesque measure follow. Let m be 
the two-dimensional Lebesgue measure. Then 


m(A) 


inf{m(U) : U D> A: U open} 
= sup{m(k): k C A, Kk compact}. 


This is part (a) of Folland’s Theorem 2.40. We will also need Theorem 2.40(c) 
which states that for any set F with m(E) < oo ande > 0, one can find a set A, 
which is a finite union of rectangles, such that m(AAE) < ec. This result is 
analogous to Proposition 3.16 as is its proof. 

By mimicking the proof of Theorem 6.6 one also gets 
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Theorem 6.11 Jf f € L'(m) and € > 0, then there exists a simple function 
Cao, ON R;» Where the R's are rectangles, such that 


[if oldm <e 


There is also a continuous function g : R? — R with bounded support, such that 


[if slam Sa. 


Of course, these results extend to Lebesgue measure and Lebesgue measurable 
functions on R” for arbitrary n = 2,3,4,.... 


Example. Construction of a sequence of independent random variables. For 
each n = 1,2,..., let (Xn, Mn, Pn) = ((0,1], £,m) and let € : X, — Rbea 
random variable with desired distribution, constructed as in earlier examples. Let 
(X,M,P) = (TI Xn, TP Ma, [TP Pn) and set mn(21, L2,---) = En(@n). Then, 
by the construction of product measure, letting E,, = {r, € Xn: €,(x) € B} 
and £; = X; forj An, 


P(tm € B) = PTL &) = P,(En € B). 


Similarly it follows that the 7),,’s are independent on the 7-system consisting of 
sets of the form [[/°{# : n(x) € (—o0,b,)} with b, = oo for all but finitely 
many n and hence on the whole of o (71, 72,...) by Corollary 4.14. (In fact, we 
made precisely this observation in the example following Corollary 4.14.) O 


The next question in focus will be when it is possible to change the order 
of integration for a double integral. First, however, some work is required to 
establish that the question makes sense. Let (X;,M,,;), 7 = 1,2, be finite 
measure spaces and let (X,.M, ju) be the product space. Let f be a complex- or 
IR,-valued Borel function on X. 


Lemma 6.12 For every x, € X, and x2 € Xo, f(-,x2) and f(x1,-) are M,- 
measurable and M»-measurable respectively. 


Proof. Obviously it suffices to check the first statement. Let 
H={f : f(-, x2) is Mi-measurable}. 
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Let 
T={ExXF:EEM, FEM}. 


Then Z is a 7-system that generates M. Pick E' x F' € TZ, let f = yxy and 
g = f(-, £2). Pick a set B € B(C). Suppose 1 € Band 0 ¢ B. If x € F, then 
g'(B) = E and if x2 ¢ F, then g~'(B) = @. Hence g~'(B) and g~'(B°) are 
measurable. In case B contains both 0 and 1, g~'(B) = X 1. We conclude that 
g is measurable. Since H is a monotone class, H contains all bounded functions. 
Since limits of measurable functions are measurable, approximating by a sequence 
of simple functions now shows that 1 contains all /-measurable functions as 
desired. O 


By this lemma, the two functions g : X, + Candh: Xj — C given by 


g(@1) = a f (x1, £2)dp2(22) 


h{ve) = f (x1, 2)dpn (x1) 


X1 
are well-defined. 


Lemma 6.13 The functions g and h are measurable. 


Proof. Let H be the family of f’s such that the corresponding g and h are 
measurable. If f = 1, then g = jig(X2) and h = p(X) are finite constants and 
hence measurable. Since measurability is closed under linear operations, H is a 
vector space. If f, t f, f, => 0 and f is bounded, then f € H by the MCT. Thus 
H is a monotone class. 

Now if f = vexr, EF € Mi, F € Mg, theng = Oifx, ¢ Fandg = po(F) if 
x1 € E, whichis clearly measurable in either case. Hence H contains all bounded 
functions by the Monotone Class Theorem. Now extend to all f by the MCT and 
linearity. oO 


By this Lemma 6.13, it makes sense to define 


[ ( Xs f (1, 2)dper(2t2) ) da (21) 


and 


[. ( - f(r, %2)dyus(01)) dual). 


However, are they equal? Also, how do they relate to [ Xixx, SEH X p2)? 
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Theorem 6.14 (Tonelli’s Theorem) /f f © L*(X,M, 1), then 


[tu [ ( & f (1, %2)dpts(e2) ) day (x1) 
i: ( ib f(s, %2)du(21)) dual). 


Proof. Let H be the class of bounded f for which the statement holds. Then 
Nexr € H for every E x F € M, since all three expressions are then equal to 
[ii (F)u2(F’). Taking EF = X, and F’ = X2 shows that 1 © H. Since ju; and pl2 
are finite, #1 is a complex vector space. By the MCT, it now follows that H is a 
monotone class. Hence H contains all bounded functions, by the Monotone Class 
Theorem. The proof is now completed via another appeal to the MCT. O 


For f € L1(X,M, 1), Tonelli’s Theorem together and linearity of integrals 
show: 


Theorem 6.15 (Fubini’s Theorem) /f f € L'(X,M, ,1), then 


[tae = fo Cf, Peer es)duatas))aues(er) 
= [ ( s f (v1, %2)dyn(21)) dual). 


It is very useful to note that in order to check that a given function f is in- 
tegrable with respect to the product measure, one can use Tonelli’s Theorem on 
|f| to do the integration in the most convenient order and check if the resulting 
integral is finite. 


By countable additivity, Tonelli’s and Fubini’s Theorem’s extend to o-finite 
measure spaces. However, they do not extend beyond that. Consider for example 
X, = X_ = [0,1], Mi = Mz = BIO, 1], x. = m and pg counting measure (i.e. 
/2(F’) is the number of points on F’, so that 12 is infinite for all infinite sets). Note 
that ju is not o-finite. Let A be the diagonal, i.e. A = {(x,x) : x € [0, 1]}. (Why 
does A € B x B?) Then 


[ (|. xa (@1, 2)dpta(2)) dpa (er) me 


since the inner integral is constantly 1, whereas 


a (|. xa (1, 2)dpu(1)) dpia(ra) _( 


46 


since the inner integral is constantly 0 in this case. (Exercise: What is [ x fd?) 


In Fubini’s Theorem, also the integrability condition is necessary. For an ex- 
ample that demonstrates this, let_X, and X2 both be the set of natural numbers and 
fi, and j12 both counting measure. Let A be the diagonal {(k,k) : k = 1,2,...} 
and B the off-diagonal {(k,k + 1),k = 1,2,...}. Letting f = ya — xp, we get 
Sx, Sx, fdvedis = Oand fy. fy, didp2 = 1, whereas J, fdy is undefined. 


7 Signed measures 


Let (X,.M) be a measurable space and let v : X > R. 


Definition 7.1 The function v is said to be a signed measure if 


@ vy assumes at most one of the values © and —cx, 


e V(US En) = 2 v(E,) whenever E,, € M are disjoint and the sum con- 
verges absolutely if u(y En) is finite. 


Sometimes when we speak of a measure in a context where also some signed 
measure appears, we will refer to the measure as a positive measure to make the 
distinction clear. 


Example. If i, and ji are two measures on M and at least one of them is finite, 
then 1; — [2 is a signed measure. O 


Example. If f is real-valued and M-measurable and at least one of f* and f~ 
is integrable, then 


ve) = | fay 


defines a signed measure. A function of this kind is called an extended integrable 
function. 7 


Proposition 7.2 Let v be a signed measure. If E,, + E, then v(E,) > w(E). If 
E, | E and v(F)) is finite, then v(E,) > v(E). 
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Proof. Let F,, = E, \ En-1 so that the F,,’s are disjoint and FE = US Fy. 
Then, exactly is in the positive measure case, 


N 


=e lim Se) = lim v(Ey). 


1 


The second part also goes through exactly as for positive measures. O 


7.1 Jordan-Hahn Decompositions 


Definition 7.3 Let v be a signed measure. A set E is said to be a positive set for 
v if v(F’) > 0 whenever F is measurable and F © E. A negative set is defined 
analogously. If E' is both positive and negative for v, then E is said to be a null 
set for v. 


It is obvious from the definition that any subset of a positive/negative set is 
positive/negative. It is also clear that the union and the intersection of two posi- 


tive/negative sets are positive/negative. 


Lemma 7.4 Let P,, P2,... be positive sets for the signed measure nu. Then P = 
U?° P,, is also positive. 


Proof. Let Q; = P, and Q,, = P, \Uy~ : P;, so that the Q,,’s are disjoint and 
U? Q, = P. Then each Q, is positive, so for any E C P,v(ENQ,,) > 0. Hence 


B) = HEN.) 20 
1 


by countable additivity of v. oO 


The next result states that given a signed measure, the space can be partitioned 
into a positive and a negative part, in an essentially unique way. 


Theorem 7.5 (The Hahn Decomposition Theorem) Let v be a signed measure 
on (X,M). Then the is a positive set P and a negative set N such that X = 
PUN. If P' and N' are two other such sets, then PAP’ and NAN‘ are null for 
V. 
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Proof. Assume without loss of generality that 1 does not assume the value 
+oo. Let m = sup{v(E) : E positive}. Pick a sequence {P;} of positive sets 
such that v(P;) — m. Since positivity is closed under finite unions, we may 
assume that the P;’s are increasing. Let P = UJ? P;. By Lemma 7.4, v(P;) > 
v(P),sov(P) =m. 

Let N = X \P. We claim that N is negative. Assume for contradiction that NV 
is not negative. Observe that there can be no positive subset E of N with v(E) > 
0, since that would imply that PUE is positive and v(PUF) = v(P)+v(E) >m, 
contradicting the definition of m. Hence there must be an E' C N with v(E) > 0, 
but £ not positive. This means that there is an F C FE with v(F’) < 0. This 
implies that v(E \ F’) > v(E). Iterating this observation will lead to the desired 
contradiction. 

Let n; be the smallest positive integer such that there exists an A; C N with 
v(A,) > 1/ny. Pick such an A;. Since A, is not positive, we can let ny be 
the smallest positive integer such that there exists and Ay C A, with v(A2) > 
v(A,) + 1/n2. Pick such an Ay. Since v(Az) > 0 and Ay C N, Az is not 
positive. Iterate the procedure to produce smallest possible integers n3, 4, ... and 
A3, Aa,... with v(A,) > i n,'. Let A = (){° An. Recall our assumption that 
vy does not take on the value +00. Consequently v(A) < oo. Hence Proposition 
7.2 implies that 1(A,,) + v(A) so that 


1 
UT <A) <o. 
j= 9 


From this it follows in particular that lim;n; = oo. However A C N, so A 
is not positive. Thus there exists a positive integer n and a B C A such that 
v(B) > v(A) +1/n. Since n; + oo, n; > n for large enough j. Thus v(B) > 
v(A)+1/n > v(A;)+1/nand B C A C Aj. This contradicts the choice of n; 
as the smallest integer n for which such a B exists. 

Finally if P’ U N’ is another partition into a positive and a negative set, then 
P\ P’ C PON‘ and is hence null. Analogously P’ \ P, N \ N’ and N’ \ N are 
null. Oo 


A partition of the space X into the sets P and JN, as in the Hahn Decomposi- 
tion Theorem, is called a Hahn decomposition (with respect to v). 


Definition 7.6 Jf 1, and v2 are two signed measures on (X,M), then they are 
said to be mutually singular (or just singular) if there exist E, F € M such that 
BEUF=X, Eis null for v2 and F is null for 4. 
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In words, 1, and v2 are singular if they live on disjoint parts of X. When 1, and 
V2 are singular, this is denoted by 1, L v2. It follows from the Hahn Decompo- 
sition Theorem that any signed measure v can be written as the difference of two 
positive measures. These are mutually singular and unique. This is summarized 
in the following result. 


Theorem 7.7 (The Jordan Decomposition Theorem) Let v be a signed measure 
on (X,M). Then there exist two unique positive measures v* and v~ such that 
pay? jp. 


Proof. Let X = PUN bea Hahn decomposition with respect to v and let 
vy (E)=v(b OP) 


y (BE) =—-v(ENN), 
E © M. Then vt and v~ are positive, singular and vy = y+ — v~. It remains to 
prove uniqueness. Assume that v can also be written as v = jut — yw” for two other 
positive singular measures z+ and y~. Then there are disjoint sets EF, F €¢ M 
such that £& U F = X and yt (F’) = uw (E) = 0. Hence E U F is another Hahn 
decomposition of X and hence PAE is null for v. Therefore, for any A € M, 


wt (A) = pt (ANE) =v(AN BE) =V(ANP)=v* (A). 


Thus pt = v* and analogously 7 = v~. O 


A decomposition of a signed measure in this way is called a Jordan decom- 
position or a Jordan-Hahn decomposition. The measures vt and v~ are called 
the positive variation of v and the negative variation of v respectively. The total 
variation of ju is the measure |v| := vt + v~. The integral with respect to the 
signed measure v is given by 


[tw = [tar _ [ter feL* (ry). 
We say that v is finite if |v| is finite and we say that v is o-finite if |v| is o-finite. 


7.2 The Lebesgue-Radon-Nikodym Theorem 


Let v be a signed measure and 1 a positive measure on (X, M). 
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Definition 7.8 [fv(E) = 0 whenever E € M and 1(E) = 0, then v is said to be 
absolutely continuous with respect to 4, denotedv < 1. 


Immediate consequences of the definition are thatv < p iff (vt < p and 
vy <p) iff |v| < and that (v < andy | p)iffv = 0. 
Example. Let € : (V,M,P) — (R,B(R)) be a random variable. Recall the 
measure P, on B given by P<(B) = P{€ € B}. If Pe < m, then € is said to be a 
continuous random variable. O 


The classical Radon-Nikodym Theorem states that whenever v < 1, there 
exists an M-measurable function f such that 


y(E) = [ tev EemM, 


provided that y and v are o-finite. The Lebesgue-Radon-Nikodym Theorem 
(LRNT) provides even more information. Before that, a preparatory lemma is 
required. 


Lemma 7.9 Assume that v and 1 are two finite measures on (X,M). Then either 
vy | wor there exists € > 0 and E € M such that u(E) > 0 and E is positive for 
Vv — el. 


Proof. Let P,UN,, be a Hahn decomposition for v—n~!, n = 1,2,.... Write 
P=U,, Pn and N = {),, Nn, so that P, ¢ P and N,, | N. Since N is negative 
for vy —n~'p, v(N) < n~'p(N) for all n and since pu is finite, this implies that 
v(N) = 0. If u(P) = 0, then p L v. If u(P) > 0, then u(P,) > 0 for some k by 
continuity of measures. Now take E = P, ande = 1/k. Oo 


Theorem 7.10 (The Lebesge-Radon-Nikodym Theorem) Let v be a signed mea- 
sure and | a positive measure on (X,M), both o-finite. Then 


(a) there exist unique o-finite signed measures and p such that 
AL fp, pKp, vH=At+p 
(b) there exists an extended j1-integrable function f such that 
p(B) = f fay 
E 


for all E € M. If g is another such function, then f = g 1-a.e. 
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Proof. We do this for v, ju finite positive measures; the extensions are straight- 
forward. The uniqueness parts are left for exercises (or reading in Folland). 

Let F be the set of MM-measurable nonnegative functions f such that [,, fdju < 
v(E) for all FE € M. Then ¥ is nonempty (since at least 0 € F) and F is closed 
under finite maxima, since if f,g € F, then 


[tvou= | fan + | gdu <v(EN A) =v(EN A‘) = V(A) 
E ENA EM Ae 
where A = {x: f(x) > g(x)}. Leta = sup{f fdu: f € F}. Note thata < 


v(X) < oo. Pick f, € F such that f f,du — a. Letting g, = max(f1,..., fn) 
we get that g, t g := sup,, fn pointwise, so that the MCT implies that 


i = tim f gad = @: 


The MCT, applied to 9,2 for each E € M, also implies that g € F. Hence the 
set function \ defined by 


AE) = v(B)— f gay 
E 
is a positive measure. Set p(E) = J,, gdjs. Then we are done if we can prove that 


A and pu are singular. If not, Lemma 7.9 implies that we can find F with p(E) > 0 
and € > 0 such that \ > ew on E.. However then for any F’' € M, 


[lorewdn= f dure nB) < f adn + MP) = HF), 


Le. g + €Xg¢ € F, a contradiction. Oo 


Writing vy = A+pwith A 1 jpandp < pis called the Lebesgue decomposition 
of v. 

In case v < yp, the LRNT gives states that v(EZ) = f,, fdu for all E € M, 
i.e. the Radon-Nikodym Theorem. It is common to write f = dv/d, the reason 
of course being that the notation in itself suggests the property that defines the 
function f, namely that [,,(dv/du)du = f,,dv for all E. "Multiplying” by dy, 
one also writes dv = fd. The function dv/dy is called the Radon-Nikodym 
derivative of v with respect to p. 

Note that the LRNT works fine even if it is assumed v is a signed measure; 
just Jordan decompose ju and use the LRNT on yi* and u~. 


a2 


The most important applications of the LRNT are the Fundamental Theorem 
of Calculus and the Integration by Parts formula for Lebesgue integrals. We will 
come back to those shortly. Another fundamental application is the concept of 
conditional expectation in probability theory. 


Example. (Conditional Expectation) Let f € L1(X,M, 1), 1: o-finite. Define 
v(E) = J, fdu, E © M. Then v is a finite signed measure such thatv < . 
Now let NV be a sub-o-algebra of M. Then obviously v|y < pul. By the LRNT, 
this entails that there exists a function g € L'(X,N, j1|,;) such that 


(EB) = [ od 


[teu | on 


for all E € N. This provides the base for the definition of conditional expectation, 
as follows. 

Let (X, M, P) be a probability space and € and 77 integrable random variables. 
We would like to find a sensible, proper definition of the conditional expectation 
E[€|7]. Clearly, writing y = E|€|n], ~ should be a random variable which is a 
function of 7. In other words, 7 should be a o(7)-measurable function. Now, it 
is intuitively fairly clear that the conditional expectation of € given an event A 
should satisfy 


for all E € N, ice. 


B[¢|A] = 


for any A such that P(A) > 0. Since ~ = E|€|n] should equal E[£|7 € B] if we 
are told that 7 € B (for some B € B(R)) and no more, we should have 


= = es cal 
[ve = [« 


for all A € o(7). This is the criterion that is used for the formal definition. 


for all B € B, ie. 


U 


Definition 7.11 Let N be a sub-c-algebra of M and € an integrable random 
variable. Then w is said to be (a version of) a conditional expectation of & given 


a3 


N if w is N-measurable and 


[var = [ea 


By the above observations, the existence of such a w follows from the LRNT. 
Note that two versions of the conditional expectation must be equal a.s. (exercise). 


U 


forall AEN. 


Here are a few more results on the validity of the dv /dj-notation. 


Proposition 7.12 Assume that 1, v and X are o-finite measures, v K 4 and 
bh<A. 


(a) Ifg € L\(v), then g(dv/du) € L*(1) and 
d 
[saw = [open 


dv _ dv dy 
dy du dX 


(b) 


A-a.€. 
Proof. 
(a) Ifg = yz, E € M, then 


[og pm | au = of B)= f a= f oav 


Now use linearity of the integrals prove the result for simple functions, then 
the MCT for nonnegative functions and then linearity again for general g. 


(b) Pick E € M arbitrarily, let g = yx(dv/di) and plug this into (a), letting ju 
and X play the réle of v and ju respectively. Doing so gives 


dv dy , dv dv 
di =v fb) = —d 
laa ae i ame i ‘ 


where the first equality is by (a) and the other two by definition. By Propo- 
sition 6.3, this proves (b). 
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Oo 


Example. Ifv < wand pw < vy, then (dyv/du)(du/dv) = 1 almost everywhere 
with respect to any of the two measures. O 


7.3, Complex measures 


Let (X,.M) be a measurable space. A set function v : M -—> C is said to be a 
complex measure if it can be written as 


V=V,+i; 


where v,. and 1; are finite signed measures. We let L'(v) = L'(v,) M L'(v;) and 


for f € L'(v), we define 
[tev [ tani f tan. 


For two complex measures v and pz, we write vy 1 pw ifv; | py, for all four 
combinations of i,j € {r,i}. If « is a positive measure, we write v < yu if 
Vv, < poand v; << ju. The Lebesgue-Radon-Nikodym Theorem now goes through 
unchanged if the signed measure v is replaced with a complex measure. 


The total variation of the complex measure v is given by 
v|(EZ) = sup{— |v(F,)| : Fi, Fa, -.. disjoint and (_J F, = F}. 
i i 


It is fairly easy to show that |v| is a finite measure. It is obvious that v < |v| and 
that for positive measures we have py < yu iff |u| < p. 


Proposition 7.13 Let f = dv/d|v|. Then |f| = 1 |v|-ae. 


Proof. On one hand 


|| tain =|-(2)| s i(B) = f rab 


for all FE € M, so |f| < 1 a.e. On the other hand, if |f| < 1 ona set of positive 
measure, then by continuity of measures and separability of C, there must be an 
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n € Nandaz € C with |z| < 1 — 2/n such that f € B,/,(z) ona set of positive 
measure. Let FE = {x : f(x) € Biyn(z)} for such n and z. Then for all FC E, 


Pi=| f tavi| s f isle < ayer), 


Hence for all disjoint F, F>,... whose union is L, we get 


SMe F< (1- =I (F), 


contradicting the definition of |v]. Oo 


A few immediate consequences of the definition of the total variation and the 
above proposition conclude this section. 


e If f =dy/d|pl, then f,,|f\d|u] = |u|(£) for all E € M. More generally, if 
v <p for a positive measure ys and f = dv/dy, then |f| = d\v|/dy p-a.e. 


e If 1, and 1 are two complex measures, then |v, + v2| < || + |v]. 


7.4 Differentiation in R” 


In this section, we are going to have (X, M, 1) = (R", B(R"), m) for some n = 
1,2,... throughout. 

Suppose that v is a o-finite signed measure satisfying v < m. By the Radon- 
Nikodym Theorem, f = dv/dm exists and satisfies 


[ fear = v8) 


B, x) J (t)dt 
P(g) = lint ee) (x)) = lim Jasco Oat )) , 

r10 m(B,(x)) 40 m(B,(a)) 
provided that the limit exists, i.e. F' is the limit of the average value of f on B,(x), 
when it exists. Intuitively, one would expect that F' = f a.e. Is this true? This 
question will be the focus of our attention in this section. Define 


m(B,(x)) 


forall F € B. 
Let 
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so that F(x) = lim,49A,f(z) when f = dv/dm. We define A,f(a) for 
all functions f for which the definition makes sense, ie. for f € Lj, where 
Line is the space of all locally integrable functions, i.e. all functions g for which 
Jie \g(x)|dx < oo for all compact A’. (Note that Lj, is precisely the space of 


loc 


functions g for which v(E) = J,, g(~)dzx defines a o-finite measure.) 


Lemma 7.14 Let C be a family of open balls in RR” and let U be the union of all 
the sets inC. Then, for any c < m(U), there are disjoint sets B,,..., By € C such 
that )~* m(B;) > 37-"c. 


Proof. Since m is inner regular, by (4), there exists a compact set k C U such 
that m(/<) > c. Since C is an open cover of K, there are A,,..., A; € C such 
that G: A; > K. Let B, be the largest of the A,’s (in terms of radius; if there is 
more than one ball with the largest radius, then choose arbitrarily). Next let Bo 
be the largest of the remaing A,’s that does not intersect B,. Then let 63 be the 
largest of the now remaining A; that does not intersect 5; or By. Keep on doing 
this recursively until no A; remains that does not intersect any of the chosen B,’s. 
Let k be the index of the last A; chosen by this procedure. 

Suppose that A; is one of the A,’s that was not chosen. Then there is a smallest 
index j such that A; B; A (). We must then have that the radius of A; is at most 
as large as the radius of B;, since otherwise A; would itself have been chosen at 
step j or earlier. This means that A; C B;, where By is the ball centered at the 
same point as 6; and with three times the radius of B;. 

Repeating this argument for all A,’s that were not chosen shows that K C 


Oi B;. Since m(B) = 3"m(B;) we get 


Lemma 7.15 The function A,.f (x) is continuous in r and x. 


Proof. Let c = m(B,(0)) so that m(B,(x)) = cr". Hence 


a7: 


so that it suffices to check that [ B,(a) 4 (t)dt is continuous in (z,r). If (2,r) > 
(0,70), then xB,(2) > X Brg (xo) POintwise, except on a subset of the boundary 
of B,,(xo), a null-set. Also, for x close enough to xo, all these characteristic 
functions are bounded by x, , (2) Which is an integrable function. Since f is 
locally integrable, it now follows from the DCT that 


: f(t)dt + F(t)dt 
B,(a) Bro (xo) 


as desired. Oo 
Next we define the Hardy-Littlewood maximal function, H f(x). 
Definition 7.16 For f € L', let 


Hie = SU ANE) x € R”. 


Theorem 7.17 (The Maximal Theorem) For f € L! anda > 0, let Ef = {x € 
R” : Hf (x) > a}. Then, for all f and a, 


m(ES) em fir \\dt. 


Proof. Fix f and a. If Ef = 0, the result is trivial, so assume otherwise. Then, 
for each x € EJ, pick r, > 0 so that A,|f|(2) > a. By Lemma 7.14, we can 
find 71,...,27, € E/ so that the B; := Br, (x;)’s are disjoint and ay m(B;) > 
3-"m(EZ). However 


[ If (lat = m(B,)A,,, [fl(2;) > am(B;) 


Nie) 


srm({EL) < Som(B) <2 >> f (sold s = f |e 


O 


We are now ready to show that the limit as r — 0 of A,f(a) is indeed f(z) 
for any locally integrable /. 
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Theorem 7.18 Jf f € Li, then for a.e. x € R", 


loc’ 
lim A, f(x) = f(x). 


Proof. It suffices to prove the result for f € [—NV, N]” for arbitrarily fixed N 
and hence we may assume without loss of generality that f € L’. Then, for any 
€ > 0 by Thorem 6.11, there exists a continuous integrable function g such that 


[lf - slat <e 


Since g is continuous, there is for each x and each 6 > 0, an r > O such that 
|g(t) — g(x)| < 6 whenever |t — | < r. For such an r we have 


t) — g(a) )dt 
|Arg(x) — g(x)| = foo ne <6 


Hence A,g(x) + g(x) as r — 0. From this, it follows that 


lim sup |A, f(a) — f(x)| < limsup |A,(f(@) — g(a) 


(A,g(x) — g(z)) + (9(@) — f(x) 
A(f — g)(x) + |f — gl(z), 


by the triangle inequality and that the middle term of the second expression van- 
ishes by the above. For a > 0, let E, = {x : limsup,_,,|A,f (x) — f(x)| > a}. 
We want to show that m(E£,) = 0 for every a. Let F, = {x : | f(x) — g(x)| > a}. 
By the above inequality, 


IA 


a 


BG fea U fa Af =@)(e) Sa] 2}: 


By the Maximal Theorem, the measure of the second set on the right hand side is 
bounded by 2-3” [ | f(t) — g(t)|dt < 2-3”e. Also, by Markov’s inequality, 


m(Es) <= f f(t) - g(t)|dt < =e 


Hence m(E,) < (2(1 + 3”)/a)e and since € was arbitrary, we are done. Oo 
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Note that by applying Theorem 7.18 to the function g(t) = | f(t) — f(x)|, we 
find that also the following slightly stronger statement holds: 


1 
lim — a | ip AO: 
r0 m(B,(r)) Jp, (2) 
The result can be generalized a bit further by replacing the balls B,(x) by more 
general sets. A family of sets {E,},s0 is said to shrink nicely (or E, shrinks 
nicely) to x if EF, C B,(x) for all r and there is an a > 0, independent of r, such 
that m(E,.) > am(B,(ax)) for all r. It is now easy to see that 


; 1 
ae) = lf(t) — f(x)| =0 
whenever £,. shrinks nicely to x. As a special case of this, consider a signed 
measure v on 6(R”) such that |v|() < oo for all compact K andv < m. 
Letting f = dv/dm, we get that f € Lj, and hence 


loc 
v(E,) 
i 
30 m(Ey) 
for almost every x, whenever F,. shrinks nicely to x. In fact, this holds even if v 
is not absolutely continuous w.r.t.m. By the LRNT, one can write 


= f(x) (4) 


vy(E) = X(E) +f fan. E € B(R”) 


where \ | mand f = dv/dm. Using that \ lives on a space of m-measure 0, one 
can show that (4) still holds. (Then, of course, if x is a point for which A{a} > 0, 
this point must belong to the exceptional null-set where (4) is false.) 


Theorem 7.19 Let F' : R > R be nondecreasing and right continuous. Then the 
set of points where F is not continuous is countable and F is differentiable a.e. 


Proof. Since 
S> (F(e+) - Fle-)) = FN) — F(-N) <0, 
xeE[—N,N] 


the first assertion follows. Since F(a + h) — F(x) equals pup(x, x + h] for h > 0 
and —p(x +h, 2] for h < 0 and the sets (x, x + h] and (x + h, 2] shrink nicely to 
x, the second statement now follows from (4). (In fact, it suffices with the || < m 
version of (4). Why?) Oo 
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7.5 Bounded variation 


In this section, we will investigate find the precise conditions for and the proofs of 
two profoundly essential results for integrals, namely the Fundamental Theorem 
of Calculus and the Integration by Parts Theorem. Let F’': R > C. 


Definition 7.20 The total variation of F,, denoted Tr is the function given by 
Tr(2) =sup{)— |F(x;) — F(x;-1)| Pnse Non <5 =F Se a SE 
1 


Note that adding an extra x, on the right hand side of the definition of Tp only 
serves to increase }), |F'(x;) — F'(x;-,| for that particular set of x;’s. This means 
that when estimating 7-(b) we may always assume that a given point a < bis one 
of the x,’s if that is helpful. One consequence is that 


Tr(b)—Tr(a) = sup{)> We jh Ge) |e NaS Keres eS eh 
1 


If lim, +. T(x) < 00,we say that F' is of bounded variation. Let BV denote 
the space of functions F’ : R > C of bounded variation. By BV |a, b], we denote 
space of F’’s defined on [a,b] for which T(b) — Tr(a) < oo. A function in 
BV |a, }] is said to be of bounded variation on [a, b]. Here a few observations. 


e If F © BV, then the restriction to [a, b] of F is in BV {a, 0]. 


e If F € BV |a, bj, then the extension of F' given by F(x) = F(a) for x < a 
and F(x) = F'(b) fora > b, isin BV. 


e BV is acomplex vector space. 


e If F is differentiable and F” is bounded, then by the Mean Value Theorem, 
Tr(b) — Tr(a) < (b— a)sup, F(t) < co, and hence F € BV {a, bj for all 
—wo <a<b<o. 


Lemma 7.21 /f F is real-valued and F € BV, then Ty — F and Ty + F are 
nondecreasing. 
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Proof. Pick y arbitrarily, pick « > 0 and pick x < y. Pick 7 < 71 <...< 
In = x So that 77 |F (xj) — F(X;-1)| > Tr(a) — €. Then 


Try) + Fy) = So\F(@;) — F@-)|+ Fy) — F(@)| + FY) 


IV 


dF) =F eA)|+F@ 
> Tpr(x) -—¢€+ F(z) 


Since € was arbitrary, it follows that 7’; + F’ is nondecreasing. The other part is 
analogous. Oo 


Theorem 7.22. (a) F © BV iff RF,SF © BV, 


(b) The real-valued function F is in BV iff F can be written as the difference 
between two bounded nondecreasing functions. 


(c) If F € BV is real-valued, then F(x+) and F(a—) exist for all x and 
F'(+00) both exists. 


(d) If F € BV, then the set of points where F is discontinuous is countable. 


(e) If F € BV is real-valued and right continuous, then F is differentiable a.e. 


Proof. Parts (c), (d) and (e) follow from (a), (b) and Theorem 7.19, so it 
suffices to prove (a) and (b). Part (a) is obvious, so it remains to prove (b). The if- 
direction follows from the third and fourth notes above. For the only if-direction, 


write i Fi 
F= 5 (Tr + F’) 4 5 (Tr — F), 


which is by Lemma 7.21 the difference of two increasing functions, which are 
bounded since fF’ € BV. Oo 


Let F € BV. If F is real-valued, then writing, as in (b) of the above Theorem, 
F = F, — F5, where F) and F2 are nondecreasing and bounded is called to de- 
compose F' in its positive and negative variations. If f is complex-valued, we can 
write F = F — Fy +i(G, — G2), where the F;’s and G;s are the positive/negative 
variations of RF and SF respectively. 
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Denote by NBV the space of F € BV such that F'(—oo) = 0 and F is 
right continuous. For an F' € NBV, the functions F), F2, G; and G2 are all 
right continuous. Hence we can define the complex measure fur given by wr = 


LP, — PP, 5 aGien = en 


Proposition 7.23 If f © NBV, then F’ € L'(m). Moreover ip 1 m iff F’ = 0 
a.e. and jup < m iff F(x) = f°, F’(t)dt 


Note. Theorem 7.22(e) guarantees that F’(a) exists for almost every x, so the 
present proposition should be read with the understanding that F” is extended by 
defining it arbitrarily on the exceptional null-set. 

Proof. By the definition of derivative, F(x) = lim,.0(ur(EL,)/m(E,)), 
where FE, = (a,x +71] forr > O and E, = (a +7r,x) for r < 0. By the ob- 
servations following Theorem 7.18, F’(x) = duue/dm a.e. By the LRNT, this 
entails that , 

F(a) = X(—o0, 2] +f F'(t)dt 
where \ | m and F’ must be in L'(m) since F must be bounded by virtue of 
being of bounded variation. Oo 


One part of Proposition 7.25 is that the Fundamental Theorem of Calculus 
holds for F' € NBV defined on the whole real line, such that up < m. Can 
the latter criterion be stated in a way which is in a more direct way in terms of F’ 
itself? The answer is yes: 


Definition 7.24 A function F' : R — C is said to be absolutely continuous if 
for all « > 0 there exists ad > 0 such that )~) |F'(b;) — F'(a;)| < € whenever 
ay < by <ag< ...,by and S~5(b; — a;) <6. 


Note that absolute continuity is stronger than uniform continuity (and thus 
stronger than continuity), since uniform continuity follows from taking n = 1 in 
the definition of absolute continuity. We say that F’ is absolutely continuous on 
[a, | if it satisfies the definition restricted to a < aj;,b; < b. 


Example. If F is differentiable everywhere and F” is bounded, then by the Mean 
Value Theorem, then |F'(b;) — F(a;)| < max, F’(x)(b; — a;) for any a;,b;, so F 
is absolutely continuous. O 
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Proposition 7.25 If f © N BV, then F is absolutely continuous iff [up Km. 


Proof. If up « m, then we claim that for each « > 0 there isad > 0 
such that p(E) < € whenever m(E) < 06. It suffices to prove the claim for 
positive j1~. Suppose for contradiction the there are E, such that m(E,) < 2~* 
but p(E,) > ¢. By Borel-Cantelli, m(lim sup, E;,) = 0. However, for each n, 
fir(U%E,,) > €. Since F € NBV, pup is finite, so it follows from continuity of 
measures that ju -(lim sup; E;,) > €, contradicting that up << m. 

For the only-if direction, pick EF so that m(E) = 0, pick « > 0 and a corre- 
sponding 6 according to the definition of absolute continuity. By outer regularity 
of m and jp there are open sets U; D Up D ... D E such that m(U,) < 6 and 
Ur(E;) | ur(E). Each U; can be written as a countable union of intervals: 


a, 


It follows from the absolute continuity of F’, since 57,,(bf — ak) < 6 for each j, 
that 


JHe(U;)| < ind |r (ay, bF)| 
< hn HPO (b') — F(a¥)| <e. 
Hence ju-(E) = 0, proving that up << m. O 


Remark. It may come as a surprise that continuity of F’ is not sufficient for 
[le < m. However, consider the Cantor set C' on [0, 1]. As in Section 2, represent 
each number x € [0, 1] by its trinary expansion 


oe S- alas 


a,(x) € {0,1,2}. For x € C, let b,(x) = a,(x)/2 (recall that a,(x) € {0,2} 
whenever x € C’). Let F(x) = $73? b,(x)2~”. Extend F to a function on [0, 1] by 
letting F(x) = sup{F(c) :c € C,c < x}. Then F is ae. constant, in the sense 
that for any x ¢ C, there is an open interval containing x on which F is constant. 
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Nevertheless, F'(0) = 0 and F'(1) = 1. Since F is increasing and F'|0, 1] = [0, 1], 
F is continuous. The measure ju however, is concentrated on C’. Thus ue Lm, 
despite F’ being continuous. The function F' is known as the Cantor function. 


So, by Proposition 7.25, for functions Ff’ € NBV, absolute continuity of F’ 
implies that F(x) = {". F’(t)dt. For F defined on an interval [a, b] (or F(x) = 
F(a), x < aand F(x) = F(b), x > 6, things are even a bit better. 


Lemma 7.26 [f Fis absolutely continuous on |a, b], then F € BV a, 6). 


Proof. Take « = 1 in the definition of absolute continuity of F' and pick 6 
accordingly. Let N = |(b—a)/d|+1. For any givena = 1% <2 <...< 2, =), 
group the intervals (a ;_1, x;| into N groups such that the total length of each group 
is less than 6; this can be done by the choice of 0, at least after adding some extra 
x,’s. Hence the sum of the |F'(x;) — F'\(x;_1)|’s over each group is bounded by 1, 
sO 


day =F ope) aN. 


Since the x,;’s were arbitrary, this shows that 7-(b) < N, in particular F € 
BV |a, }]. Oo 


Summing up, we get 


Theorem 7.27 (The Fundamental Theorem of Calculus) Let —co < a < b < 
oo. Then F : |a,b| + C is absolutely continuous iff f € BV a, b|, F is differen- 
tiable a.e., F’ € L'({a, bj, £,m) and 


Pe) = a F'(t)dt 
for every x € [a, }}. 


Next we consider integration by parts. For F € NBV, write [,, fdF for 


Theorem 7.28 (Integration by Parts) Let F,G © NBV and assume that G is 
continuous. Let —oo < a <b < oo. Then 


i FdG + / GdF = F(b)G(b) — F(a)G(a). 
(a,b] (a,b] 
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Proof. By Theorem 7.22 parts (a) and (b), it suffices to do this for F’' and G 


increasing. Let Q = {(z,y):a<a<y < b}. By Tonelli, we have on one hand 
that 


(INO) = / . / AP(a)dGy) 
7 / (Flu) ~ Fla))aG) 


= | F'dG — F(a)(G(b) — G(a)). 
(a,b 
and on the other hand 


iO = i) 


= G(b)(F(b) — F(a)) - i, GdF 


(a,b] 


where the second equality requires that G is continuous. Equating the two expres- 
sions gives the result. 


8 The law of large numbers 

This section is devoted to proving the strong version of the Law of Large Numbers. 
Of course, there is a probability space (X, M, P) underlying all statements made. 
We begin with a fundamental observation. 


Proposition 8.1 Let € and 1 be independent integrable random variables. Then 


Elen] = Ele]E|n]. 


Proof. If € and 7 are simple functions, then the result follows directly from 
the definition of independence and easy algebraic manipulation. If € and 7 are 
positive, then let sequences of simple functions increase to € and 7) respectively. 
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Choose the sequences so that the simple functions are o(€)- and o(7)-measurable 
respectively (which is what one gets if one uses the basic construction of such 
simple functions). Then all functions of the first sequence are independent of 
all functions of the second sequence, by being functions of independent random 
variables. The result now follows for positive functions. Finally the general result 
follows from linearity of integrals. 0 


The weak Law of Large Numbers is very easy to prove and goes as follows. 
Here and in the sequel the abbreviation iid” stands for independent and identi- 
cally distributed”. Also, for a sequence of real numbers 71, %2,..., the quantity 
Ey, denotes the average of the first n x;:s, i.e. Fy, = 7! SO) aj. 


Theorem 8.2 (Weak Law of Large Numbers) Assume that €,, 2, .. . are tid ran- 
dom variables such that E|€,| = 0 and E|&?] = My < oo. Then for any € > 0, 


lim P(|é,,| > €) = 0. 


n 


Proof. By the above proposition, E[€,] = n~'E[é?]. Hence, by Markov’s 
inequality, 


D(|e _ pe 2 M2 
(IEnl > €) = P(E, > €) SS, 


which tends to 0 as n — oo. O 


Obviously, if E[{;| = v # 0, then applying the result to €; — uv gives that 
P(\€,, — v] > €) — 0. The strong law will make away with the assumption of 
finite second moment and also prove that €,, > 0 a.s., which is clearly a stronger 
result in both aspects. As for the weak law, it is obviously sufficient to consider 
the case E[E,] = 0. 

The strong law has a reputation of having a very involved proof. This is not 
entirely correct. Granted, compared to the weak law it is involved, but compared 
to other fundamental mathematical results it is certainly not. Here we will present 
the ’elementary proof’; the other standard proof uses martingale theory, which is 
not a topic of this course. 

Let us begin with a short and elegant proof of a.s. convergence under the as- 
sumption of bounded fourth moment. The proof of the full strong law does not 
rely on this result, so we may regard it as a side track. On the other hand, it is 
more general in that it does not assume iid random variables, only that they are 
independent and have the same expectation. 
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Theorem 8.3 (Law of Large Number under Bounded 4’th Moment (LLN(4))) 

Let &1, €2,... be independent random variables such that E|&;| = 0 for all 
j and such that there exists My < co such that EIS] < Ms, for all j. Then 
lim,, é, =O0as. 


Proof. Let S,, = )°y &;. Then 


n 


B[S4] =) Efe4] +6 S> 5 Ele? JE[e?| 


1 i<j 


since the other terms of the expansion of S have expectation 0 by assumption and 
the above proposition. Now suppose 77 is an integrable positive random variable 
and let v := E[n]. Then0 < f(y —v)?dP = fn? —2u fn +v? = E[n?] — Ely)’. 
Apply this on 7 = €? to get that E[é?] < E[g4]'/? < Mj’. Hence 


ESS (n + 6(3) an < 3n?M4. 


Therefore E[(S,,/n)*] < 3n~?Mg, so 
[5 (8) <x 


which in particular entails that (S,,/n)* > 0 as. O 


Lemma 8.4 Let 1, &,... be independent random variables with E|£;| = 0 and 
iP ElEs] < 00. Then S77 € converges as n — 00 a.s. 


Proof. Let M := )°y° E[€;]. Let S, = S77 €;. Fix two rational numbers a < b 
and let U,, be the number of up-crossings of (a, b) if S1,..., Sn, Le. 


Ug = Max has) Kole < S5  e VIS SR Sy Oh. 


Define the 0/1-random variables C1, C2, ... by taking C, = lifa > 0 and C, = 0 
otherwise and then recursively 


Cn = X{Cn—1=1,8n—1<b}U{Cn—1=0,5n—1<a} 
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Let T,, = 50} C5&;. Since each C,, is o(&,...,§,-1)-measurable, C,, and €,, are 
independent and hence E|T;,,]| = 0. However 


Ty = (b- a)U, — (Sy — a) 


so the expectation of the right hand side is at most 0. Hence 


su.) < ElSe=all < lol +BISZIY — la) + 0? 
ey oo aes b—a Ee 


Letting U,, = lim, U;,, the MCT gives E[U,,.| < 00, so that U.. < co as. By 
countable additivity of measures, this holds simultaneously for all rational a and b. 
Hence the sequence {S,,} a.s. has only finitely many up-crossings of all nonempty 
intervals, which means that either S,, converges or |S,,| — oo. In either case 
lim,, |S,,| exists, but may be infinite. However, by Fatou’s Lemma, 


E [lim |S,,|] < lim inf E[ 


Spl] < lim inf E[$?]'/? = liminf §~ E[g}] <M, 
1 


where the last equality follows from independence and the final inequality by 
assumption. Hence lim,, |.S;,| < 00 a.s. Oo 


Lemma 8.5 (Césaro’s Lemma) Suppose that v1,v2,... is a sequence of real 
numbers such that lim, Up, = Vo. Then lim,, Un, = Vso. 


Proof. Fix N so large thatn > N => |un — vo| < €. Then forn > N, 


7 1 n—WN 

ir ae > (Vso — €) 4 Ugo — € 
and FY 

a 1 n—N 

Tn < Due ; (tieg-Pey Stig he 
asn — oo. Oo 
Lemma 8.6 (Kronecker’s Lemma) Suppose x1, %2,... are real numbers such 


that )~; (xj;/7j) converges as n — 00. Then limy En, = 0. 
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Proof. Let Un, = d<)(x;/7) and v.. = lim, vy. With this notation, we get 


n n n n 
vj : 
De 29 = 2) ie De 
I 1 J 1 uF 
Hence 
1 n 
Ln = Un — = So uj-1 > 0 
ut 1 


by Césaro’s Lemma. O 


The next step is the strong law under a mild variance restriction. 


Theorem 8.7 (Law of Large Numbers under Variance Restriction (LLN(V))) 
Let (1, 2,... be independent random variables with E[w;] = 0 for all j and 
VT (ElW5]/n?) < 00. Then lim, p, = 0a.s. 


Proof. By Kronecker’s Lemma, it sufffices to prove that 5>/'(w,;/j) converges 
as n —> oo a.s. This in turn follows from Lemma 8.4 on taking €, = w,/n.  O 


Lemma 8.8 (Kolmogorov’s Truncation Lemma (KTL)) Let &, &2,... be iid 
random variables with E{€;| = 0 for all j. Let nj = €;X{\¢;\<j}- Then 


(a) lim, E[n,| = 0, 
(b) Pilim sup, {# : &n() # Mn(a)}) = 0, 
(c) LY (Elnj]/n) < ox. 


Proof. Since 7, has the same distribution as €;\ {\¢,|<n} which converges point- 
wise to €;, it follows by the DCT using |£;| as a dominating L' function, that 


En] + Elf] = 0. 


This proves (a). For (b): 


Pim #H) = YI PUEl > 2) 


1 1 


E[ So xcen| 
a 


< Ellal] <0, 
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where the second equality follows from the MCT. Hence (b) follows from Borel- 
Cantelli’s Lemma. For (c): 


oe) 


iy Elm] _ y EIST Xeri<n}] 
ne n2 


1 1 


Co 


ni xX n 
- ofa ae 


1 

: di 
SE De oa 
n=||€i|J+1 


=| IéaI| < 0, 


= 


eo 


where the second equality follows from the MCT. oO 


Theorem 8.9 (The Law of Large Numbers) Let €), €2,... be iid random vari- 
ables with E|€,| = 0. Then 


almost surely. 


Proof. Let Im = €nX{jin|<n}- By (c) of KTL and LLN(V), almost surely, 


n 


occ al : 
lim — (nj — Elny]) = 0. 
1 


By KTL (a) E[nn] > 0, so by Césaro’s Lemma, n~! 5>1’ E[n;] + 0. Hence almost 
surely, 


lim 7, = 0. 


Finally, by KTL (b), almost surely 7), ¢ €,, for only finitely many n, so 


almost surely. Oo 
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